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SEQUENTIAL METHOD OF SAMPLING FOR DECIDING 
BETWEEN TWO COURSES OF ACTION* 


ABRAHAM WALD 
Columbia University 


1. INTRODUCTION 


HE PROBLEM of deciding between two alternative courses of action 
wae frequently in practice. Such a problem is a statistical one, 
if the preference for one or the other action depends on some unknown 
population characteristic (parameter) and a decision is to be made on 
the basis of incomplete information given in the form of a random 
sample drawn from the population. For example, the problem may be 
to decide whether a lot consisting of a large number of units of some 
manufactured product should be accepted or rejected. Suppose that 
each unit is classified into one of the two categories, defective and non- 
defective, and that the preference for acceptance or rejection depends 
on the number of defectives in the lot. Thus, in this case the population 
is the lot and the unknown characteristic (parameter) of the population 
is the number of defectives in the lot. A statistical problem arises, if 
acceptance or rejection is to be decided on the basis of incomplete in- 
formation given in form of a random sample drawn from the lot. 

In this paper we shall be concerned with the problem of deciding 
between two courses of action on the basis of a random sample drawn 
from the population under consideration. A most frequently used 
method of sampling is that of single sampling, where the total number 
of observations, i.e., the size of the sample to be drawn from the popu- 
lation is determined in advance (before the experiment starts). For in- 

* The Journal is privileged to present this exposition by Professor Wald of his paper in the June 
1945 issue of the Annals of Mathematical Statistics. The sequential idea is not new, but Professor Wald’s 
sequential likelihood ratio method marks a new and important advance in statistics and possesses an 
optimum character which he describes in the text. The method has wide application to many fields, 
among which are sampling inspection of production output and experimental design. Although the 
subject is mathematical, the present exposition is intended to be accessible to statisticians with little 


mathematical background. Sections 5 and 6 require some mathematical background; readers who do not 
possess it may omit these short sections, accepting their results without proof.—Ed. 
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stance, in our previous example we may specify the number of units to 


he inspected tn advance and the decision of acceptance or rejection i 


then based on the number of defectives found in the set of units in- 


spected. An essentially different method of sampling is that of se- 


quential sampling, where the number of observations to be made is 
not predetermined but is dependent on the outcome of the observa- 
tions. The sequential method of sampling may be described as follows: 
A rule is given for making one of the following three decisions at any 
stage of the experiment (at the n-th trial for each integral value of n): 
(1) to take action 1, (2) to take action 2, (3) to continue the experiment 
by making an additional observation. Thus, such a sampling procedure 
is carried out sequentially. On the basis of the first observation one of 
the aforementioned three decisions is made. If the third decision is 
made, a second trial is performed. Again, on the basis of the first two 
observations one of the three decisions is made. If the third decision is 
made, a third trial is performed, etc. In the present paper we shall deal 
exclusively with the sequential method of sampling. 

The theory of the sequential method of sampling was developed by 
the author of the present paper in 1943 and was published in June, 
1945 (see reference [1]). The purpose of the present paper is to give an 


elementary exposition of the fundamental ideas together with some 
illustrations and examples. 


2. CONSEQUENCES OF THE CHOICE OF ANY 
PARTICULAR SAMPLING PLAN 


By a sampling plan we mean a rule which specifies at each stage of 
the experiment, at the nth observation for each integral value of n, 
which of the following three decisions should be made: (1) to take 
action 1, (2) to take action 2, (3) to make an additional observation. 
The process is, of course, continued until either decision 1 or decision 
2 is made. 

In what follows in this paper we shall assume that the successive ob- 
servations are independent observations drawn from the same popula- 
tion and that the population distribution of the observed variable z 
under consideration is known except for the value of a single unknown 
parameter 0, which may be the mean, the variance, or some other char- 
acteristic of the distribution of z. Furthermore, it is assumed that the 
preference for one of the two alternative actions depends only on the 
value of this unknown parameter 6. In the example mentioned in the 
introduction, the population is the lot, and for each unit in the lot the 
variable z takes only two values, the value 1 if the unit is defective 
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and the value 0 if the unit is not defective. Here the only unknown 


characteristic of the distribution of ¢ is the mean of 2, ie., the propor- 


tion of defectives in the lot. It instead of classifying the wnits into one 


of the two categories, defectives and non-defectives, we measure the 


value x of some characteristic of the unit, such as the weight, or the 
diameter, or the resistance to pressure, etc., then 2 will be a continuous 
variable which can take any value over a certain range. Suppose that 
the distribution of z in the lot is known to be normal with a known 


standard deviation, then the only unknown characteristic of the dis- 
tribution of z is the mean of z. 


2.1 The Operating Characteristic Curve of a Sampling Plan 


If a sampling plan has been selected, the probability that action 1 
will be taken and the probability that action 2 will be taken depend 
only on the population distribution of the variable x. Since it is assumed 
that the distribution of z involves only one unknown parameter 8, 
the probability of taking action 1 (or that of taking action 2) will 
depend only on the value of 6 for any given sampling plan. For any 
value 6 we shall denote by L(@) the probablity that the sampling pro- 
cedure will terminate with the decision of taking action 1. We shall call 
L(@) the operating characteristic curve, or more briefly the OC curve, 
of the sampling plan. In this paper we shall consider only sampling 
plans for which the probability is zero that the sampling procedure will 
go on indefinitely without deciding between the two courses of action. 
For such sampling plans the probability that action 2 will be taken is 
obviously given by 1—L(6@). 

To give an example of an operating characteristic curve, consider 
again the case where the population is a lot consisting of a large number 
of units of a certain manufactured product and each unit is classified 
into two categories, defective and non-defective. For each unit the 
value of z is put equal to 1 if the unit is defective, and zero if the unit 
is not defective. Suppose that the following sampling plan has been 
adopted: If the first 25 units inspected are non-defective, we stop in- 
spection and the lot is accepted (action 1). If for some value mS$25 
the mth unit inspected is found defective, no further units are inspected 
and the lot is rejected (action 2). Denote by @ the unknown proportion 
of defectives in the lot. Then the probability that z= 1 for a unit drawn 
at random from the lot is equal to 6, and the probability that z=0 
is equal to 1—@. Thus, the proportion 6 of defectives is the only un- 
known parameter involved in the distribution of z. We shall assume 
that the number of units in the lot is so large that the successive ob- 
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servations on z may be considered as independent. Then the probabil- 
ity that the lot will be accepted, i.e., the probability that the first 25 
units inspected will be found non-defective, is equal to (1—6)*. Thus, 
the operating characteristic curve of this sampling plans is given by 


(2.1) L(@) = (1 — 6)*. 
2.2 The Average (Expected) Number of Observations Required by a 
Sampling Plan 


Another important feature of a sampling plan is the number of ob- 
servations required by the plan. For the sampling plans we shall con- 
sider here the number of observations required by the plan is not pre- 
determined, but depends on the outcome of the observations. For 
example, for the particular sampling plan discussed in Section 2.1 the 
number of observations to be made may be anything from 1 to 25. 
If no defects are found during the sampling process, we shall make 25 
observations. On the other hand, if the first m-1 units inspected are 
non-defective and the mth unit is defective for some value m<25, 
then the total number of observations made will be equal to m. We 
shall denote by n the number of observations required by the sampling 
plan. Thus, n is a chance variable. Carrying out the same sampling 
plan repeatedly, we shall obtain, in general, different values for n. 
Of considerable interest is, of course, the expected value of n (the aver- 
age value of n in repeated applications of the sampling plan). For any 
given sampling plan the expected number of observations required by 
the plan will depend on the distribution of x. Since, by assumption, 
there is only one unknown parameter @ involved in the distribution of 
x, the expected value of n will depend only on @. For any value 6 we 
shall denote the expected value of n by E,(n). For any given sampling 
plan the expected value E,(n) is a function of 6 and we shall refer to it 
as the average sample number curve, or, more briefly, as the ASN 
curve. 

As an example, we shall compute the ASN curve for the particular 
sampling plan discussed in Section 2.1. The probability that exactly 
m(m<25) units will be inspected is equal to the probability that the 
first m—1 units inspected are non-defective and the mth unit is found 
defective. Clearly, the latter probability is equal to (1—6)"—'@. The 
probability that exactly 25 units will be inspected is equal to (1—6)*6 
+(1—6)*. Hence, the ASN curve is given by 


24 
(2.2) Ey(n) = >> ma(1— 6)"-! + 25[0(1 — 0)* + (1 — 6)%}. 


m=1 
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With each sampling plan there is associated an OC curve and an 
ASN curve. These two curves may be considered perhaps as the most 
important features of a sampling plan. Thus, in judging the relative 
merits of two different sampling plans we shall compare the OC and 
ASN curves of these two plans. 


3. PRINCIPLES FOR THE CHOICE OF A SAMPLING PLAN 


3.1 Dependence of the Preference for One or the Other Action on the 
Value of 0 


There are infinitely many possible sampling plans and the problem 
of selecting one of them is, of course, of fundamental importance. To 
formulate some principles for the choice of a sampling plan, we need 
to state what our preference would be for one or the other action if 
the value of @ were known. This preference will, of course, vary with 
the value of @. In the present paper we shall consider the case where our 
preference for one or the other action depends on the value of @ in the 
following manner: There exists a particular value 6’ such that if @<@ 
we prefer action 1 and if 6>06’ we prefer action 2. If @= 0 we are indif- 
ferent which action is taken. Furthermore, it is assumed that in the 
domain @< @ our preference for action 1 increases with decreasing value 
of @ and that our preference for action 2 increases with increasing value 
of 9 in the domain @>8@’. In other words, if @ is only slightly below @ 
we have only a slight preference for action 1; while if @ is considerably 
below @’ we strongly prefer action 1 to action 2. Similarly, if @ is only 
slightly above 6’, the preference for action 2 is only slight, while if @ 
is much larger than 6’, action 2 is strongly preferred to action 1.' 

In many problems arising in practice the relation of the preference 
for one or the other action to the parameter @ will satisfy the conditions 
described above. For instance, if the problem is to decide whether a 
lot be accepted or rejected the preference for acceptance or rejection 
will depend on the proportion @ of defectives in the lot in the following 
manner: there will be a value 6’ such that if the proportion of defectives 
is equal to 6’ we are indifferent which action is taken. For @<6’ we 
prefer acceptance and this preference will increase with decreasing 
value of 6. For 6>6’ we prefer rejection and this preference will in- 
crease with increasing value of 8@. 

If the dependence of the preference for one or the other action is of 
the type described above, we wish to make the probability that action 
1 will be taken, i.e., the value of L(@), as high as possible for 6<@ 


1 The case when the preference is not of the type described here, or when the number of unknown 
parameters is more than one, is discussed in [1]. 
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and as low as possible for > 6’. Thus, an ideal OC curve, as shown in 
Figure 1, would be given by a function L(@) such that L(@) =1 for 6<@’ 
and L(@)=0 for 6>0@’. 

While this ideal form of the OC curve can never be achieved on the 
basis of incomplete information about @ supplied by a random sample 


L(e) 





1wEAL OC CURVE 





eo 





FIGURE 1 


drawn from the population, it can be approached arbitrarily near if 
we are willing to take a sufficiently large sample. 

A sampling plan is the more desirable the nearer its OC curve to the 
ideal curve, and the smaller the average number of observations re- 
quired by the plan. These two desirable features of a sampling plan are 
somewhat conflicting, since the better we want to approach the ideal 
form of the OC curve, the larger will be, in general, the number of ob- 
servations to be made. To achieve a compromise between these two 
conflicting desiderata, one may proceed as follows: First we formulate 
requirements as to how near the OC curve should be to the ideal curve 
and then consider only sampling plans which satisfy these require- 
ments. From these sampling plans we try to select one for which the 
average number of observations required by the plan is as small as 
possible. To impose some conditions on the OC curve first and then to 
minimize with respect to the average number of observations does not 
seem to be an unreasonable procedure, since the OC curve is perhaps 
of primary importance. 


3.2 Requirements on the OC Curve 


As stated in Section 3.1, there exists a value 6’ such that for 6<0 
we prefer action 1 and this preference increases with decreasing value 


TION 
n in 
<7 
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of 6. Thus, it will be possible, in general, to find a value 6.<6 such 
that for values 66, the error committed by taking action 2 instead 
of action 1 is considered of practical importance. For values @> 6’ ac- 
tion 2 is preferred and this preference increases with increasing value 
of 6. Thus, it will be possible to determine a value 6; > 6’ such that for 
values 02 6, the error committed by taking action 1 instead of action 2 
is considered of practical importance. After these two values 6) and 
6; (@9<61) have been selected, the requirements on the OC curve may 
reasonably be stated as follows: For any value 64 the probability 
that action 2 will be taken should be less than or equal to a preassigned 
value a, and for any value 020; the probability that action 1 will be 
taken should be less than or equal to a preassigned value @. Since the 
probability that action 1 will be taken is equal to L(@), and the prob- 
ability that action 2 will be taken is equal to 1—Z(@), our requirements 
can be stated by the following inequalities: 


(3.1) 1—L(@)Sa for 05% 
and 

(3.2) L(0) S86 for @2 4. 
Inequality (3.1) can also be written as 

(3.3) L(@)=1—a for @S 4%. 


A typical OC curve satisfying the above requirements is shown in 
Figure 2. 


L{o) 


Typicar OC cuRVE 
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FIGURE 2 


Thus, our requirements on the OC curve are expressed in terms of 
four values, 00, 6:, a, and 8. The ordinate L(@) of the OC curve is re- 
quired to be $8 for 6= 6, and =1—a for 6 $6. The choice of the values 
60, 61, a, and £ is not a statistical problem. The selection of these four 
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values is to be made on the basis of practical considerations in each 
particular case. We shall call a sampling plan admissible if it satisfies 
the conditions (3.2) and (3.3) for the values 4, 61, a, and B selected. 


3.3 Choice of a Sampling Plan 


After the four quantities 9, @:, a, and 6 have been selected, we con- 
sider only sampling plans which are admissible, i.e., which satisfy the 
conditions (3.2) and (3.3). Of course, we wish to select a sampling plan 
for which the expected value of the number of observations required 
by the plan is as small as possible. This expected value E,(n) depends 
as we have seen in Section 2.2, on the value of 6. In Section 2.2 we re- 
ferred to the function E,(n) as the ASN curve of the sampling plan. In 
general, it will not be possible to find an admissible sampling plan which 
has a uniformly best ASN curve, i.e., a plan for which E,(n) is mini- 
mized for all values of @ as compared with any other admissible plan. 
Thus, some compromise is to be made. The sampling plans discussed 
in the next section have the property that E,4(n) is minimized simul- 
taneously for 6= 6) and @=4,. 


4. THE SEQUENTIAL PROBABILITY RATIO SAMPLING PLAN 


In what follows we shall assume that the chance variable z under 
consideration has a discrete distribution (can take only discrete values, 
for example, only integral values, with positive probabilities), or has a 
continuous distribution admitting a continuous probability density 
function. For any value @ we shall denote by f(z, @) the corresponding 
probability distribution of x. If the distribution of z is discrete, f(z, 6) 
is the probability that the chance variable under consideration takes 
the value z. If the distribution of z is continuous, by f(z, 6) we mean the 
probability density function of x. For simplicity, the following discus- 
sions will be based on the assumption that the distribution of z is dis- 
crete, although the arguments can easily be extended to include the 
continuous case. In the continuous case we have to replace the discrete 
distribution of x by the probability density of z and all results remain 
valid. 

4.1 Definition of the Sequential Probability Ratio Sampling Plan 


After the quantities @o, 01, «, and 8 have been chosen the correspond- 
ing sequential probability ratio test is defined as follows: Denote the 
successive observations on x by 21, 22, 23, -- - , etc. For any integral 
value m, the probability that we shall obtain a sample equal to the ob- 
served (11, - ++, 2m) is given by 


(4.1) Pn = f(x, 0) f (22, 6) filers S (tm, 6). 
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We shall denote this probability by pi, for @=6:, and by pom for @= 6p, 
i.e., 


(4.2) Pom = f(21, 90)f(Z2, 90) - - + f(Lm, Fo) 
and 
(4.3) Pim = f(21, %1)f(L2, 1) + + + f(Xm, 61). 


At each stage of the experiment, at the mth observation for each 
integral value of m, the probability ratio pimn/pom is computed. We con- 
tinue taking observations as long as the ratio pimn/Pom is neither suffi- 
ciently small nor sufficiently large. If for some value of m the ratio 
Pim/Pom is less than or equal to a preassigned value B, the process is 
terminated with the decision of taking action 1 (the action which is 
preferable when @<6’). If for some value of m the ratio Pim/Pom is 
greater than or equal to a preassigned constant A, we terminate the 
process with the decision of taking action 2 (the action which is pref- 
erable when @> 6’). In other words, two constants A and B are chosen 
such that 0<.B<1<A and the sampling process is then carried out as 
follows: At each stage of the experiment the ratio pimn/Pom is computed. 
If 


(4.4) gemca 


Pom 
an additional observation is taken. If 


(4.5) Pim < B, 
Pom 


the process is terminated with the decision of taking action 1. If 


(4.6) a 


Pom 
the process is terminated with the decision of taking action 2. 

The constants A and B are determined so that the probability that 
action 1 will be taken is equal to 1—a when @= 4 and is equal to 8 
when @=6;. In other words, the constants A and B are determined so 
that 


(4.7) L(@)'= l’'— a 


and 


(4.8) L(i) = B. 
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The problem of determining A and B so that (4.7) and (4.8) are satisfied 
will be discussed in Section 4.2. 

In most of the important cases occurring in practice, such as when z 
has a normal, binomial, or Poisson distribution, etc., the OC curve of 
the sequential probability ratio plan defined by the inequalities (4.4), 
(4.5), and (4.6) will satisfy the requirements (3.1) and (3.2). This is, 
of course, not a sufficient justification for the use of the sequential 
probability ratio plan. The merit of any plan (satisfying the require- 
ments (3.1) and (3.2)) can be judged on the basis of its ASN curve. 
It will be seen in Section 4.3 that the ASN curve of the sequential 
probability ratio plan has some optimum properties. 

For purposes of practical computation, it is much more convenient 
to compute the logarithm of the ratio pin/Pom than the ratio pim/Dom 
itself. The reason for this is that log (pim/pom) can be written as the 
sum of m terms, i.e., 


Pim (21, 61) (X2, 61) (Im, 0 
i tee wt + ge +. ere. 
Pom f(x, 90) JS (x2, 90) S(Zm; 60) 

We shall denote the ath term in this cumulative sum by 2,, i.e., 

(Za, 61) 
(4.10) z. = log Seu %) | 

(Ze, 80) 
Using the quantities z. (a=1, 2, - - - ), the sampling process is carried 
out as follows: At each stage of the experiment, at the mth observation 
for each integral value of m, the cumulative sum z;+ - - - +2, is com- 
puted. If 
(4.11) log B<at-:-+ +2, <logA 
an additional observation is taken. If 
(4.12) Zt+:--+:+ +2m S log B, 
the process is terminated with the decision of taking action 1. If 
(4.13) 2: +:+-+ +2, = log A, 






the process is terminated with the decision of taking action 2. 
Applications of the sequential probability ratio plan to various par- 
ticular problems will be discussed in Sections 7, 8, and 9. 







4.2 Determination of the Constants A and B 


As stated in Section 4.1, the constants A and B are to be determined 
so that the probability that action 1 will be taken is equal to 1 —a when 
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§= 6, and is equal to 8 when 6=6,. An exact determination of the val- 
ues A and B is rather involved. However, an approximation which is 
sufficiently good for most practical purposes can easily be derived. A 
rigorous and detailed discussion of this problem is given in Sections 3.2 
and 3.3 of [1]. Here we shall merely indicate the derivation of the ap- 
proximate values of A and B. 

It follows from inequality (4.6) that for any given sample which leads 
to the decision of taking action 2, the probability of obtaining such a 
sample when @= 60; is at least A times as large as the probability of ob- 
taining such a sample when @= 9, i.e., 


(4.14) Pim = ADom: 


Thus, also the probability of the totality of all samples which lead to 
the decision of taking action 2 is at least A times as large when 0=@ 
as when @=6». But the probability of the totality of such samples 
the same as the probability of taking action 2. The latter probability 
equal to 1—8 when @=9@, and is equal to a when @=6». Hence, we 
tain the inequality 


(4.15) (1 — B) = Aa. 
This inequality can be written as 
Toad 


a 





(4.16) As 


Thus, (1—8)/a is an upper limit for A. 

A lower limit for B can be derived in a similar way. In fact, from 
equality (4.5) it follows that for any given sample which leads ‘to 
decision of taking action 1, the probability of obtaining such a samy 
when @=6; is at most B times as large as the probability of obtaini 
such a sample when @= 6». Thus, the probability of taking action | 
at most B times as large when @ = 6; as when 6 = 6». Since the probabiliti 
of taking action 1 is equal to 1—a@ when @= 6) and to 8 when 06=6,, w 
obtain the inequality 





(4.17) B S (1 — a)B. 

This inequality can be written as 

(4.18) hint 
l-—a 


Thus, 8/(1—a) is a lower limit for B. 



































288 AMERICAN STATISTICAL ASSOCIATION 


The reason that (4.16) and (4.18) are inequalities instead of equalities 
is that the sequential process may terminate with (pim/Pom)>A or 
(Pim/Pom) <B. If at the final stage pimn/pom were exactly equal to A 
or B, then the equality signs would hold in (4.16) and (4.18). If f(x, 6) 
(the distribution of z when @=6@;) is near f(z, 60) (the distribution of z 
when @= 4), it is almost certain that the value of pim/pom is changed 
only slightly by one additional observation. Thus, at the final stage 
Pim/Pom Will be only slightly above A, or slightly below B, and conse- 
quently A and B will be nearly equal to (1—8)/a and 8/(1—a), respec- 
tively. Hence, for practical purposes we may put A=(1—8)/a and 
B=£8/(1—a). The nature of this approximation is more fully discussed 
in Section 3.3 of [1]. 


4.3 Efficiency of the Sequential Probability Ratio Sampling Plan 


In Section 4.7 of [1] the following theorem has been proved: For 
any sampling plan for which L(@))=1—a and L(6,) =8, we have 





(4.19) Ee,(n) = 








© ont 
[a — a) log p + alog “| 
1 a 


— a 


E,(z) 


and 





(4.20) Eo,(n) 2 








l1— 
[# tog — + (1 — 8) log =] 


— Qa 


E,(z) 


where z=log f(z, 0:)/f(x, 00), Eo(z) denotes the expected value of z when 
6 = 6, and E;(z) denotes the expected value of z when 6= 4. 

It will be seen in Section 6 of this paper that E,(n) and E,(n) for 
the sequential probability ratio plan are very nearly equal to the right- 
hand members of (4.19) and (4.20), respectively. Thus, for all practical 
purposes we can say that the average number of observations takes its 
minimum value for the sequential probability ratio sampling plan in 
both cases, when 6= 6) and when @= 4. 

In most cases the ASN curve of the sequential probability ratio plan 
will have the further property that E,(n) is increasing with increasing @ 
over the range $0, and E,(n) is decreasing with increasing @ over the 
range 02 @;. 

In Section 4.3 of [1] the expected number of observations required 
by the sequential probability ratio sampling plan is compared with the 
fixed number of observations required by the current single sampling 
method. The comparison was carried out in the case when z is normally 
distributed with known variance. According to the method of single 
sampling, a predetermined number n of observations is taken. If the 
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arithmetic mean of the n observations is less than or equal to a prop- 
erly chosen constant d, action 1 is taken, and if ¢>d, action 2 is taker 

The fixed number n of observations to be taken and the constant d are 
determined so that the probability of taking action 1 is equal to l1—a 
when @=6o, and is equal to 8 when @=6;. Since the fixed number n of 


TABLE 1 
AVERAGE PERCENTAGE SAVING OF SEQUENTIAL PROBABILITY RATIO SAMPLING 
PLAN AS COMPARED WITH SINGLE SAMPLING PLAN WHEN THE UNKNOWN 
PARAMETER @ IS THE MEAN OF A NORMAL DISTRIBUTION 
WITH KNOWN VARIANCE 






























































A. When @=@; 
a | 
.o1 | .02 | .03 | .04 | .05 
: | | 
o1 | 58 | 60 | 61 | 62 | 63 
92 | 54 | 56 | 57 | 58 | 59 
03 51 | 53 | 54 | (55 | 55 
04 | 49 | 50 | 51 | 52 | 53 
05 | 47 | 49 | 80 | 50 | 51 
B. When 6 =@. 
« | | 
.01 | .02 | .03 | .0& | .05 
‘ | 
o | ss | se | 81 | 49 | 47 
02 | 60 | 56 | 53 | 50 | 49 
o3 | 61 | 57 | 54 | 51 | 50 
08 | 62 | ss | 55 | 52 | 50 
05 | 63 | so | 55 | 53 | 51 








observations required by the single sampling plan depends on a and 8, 
we shall denote it by n(a, 8). The average saving of the sequential 
probability ratio plan as compared with the single sampling plan is 
100(1— Ez,(n)/n(a, B)) percent if 6=6,, and 100(1—EZ,(n)n(a, 8)) per- 
cent if 6= 6. For the computation of E4,(n) and E»,(n) approximation 
formulas have been used, the error being negligible when @; is near 40. 
It turns out that the ratios E,,(n)/n(a, 8) and E»,(n)/n(a, 8) do not 
depend on 6) and 6;; they depend only on a and 8. In Table 1 the expres- 
sion 100(1—E2,(n)/n(a, 8)) is shown in Panel A, and the expression 
100(1—E»,(n)/n(a, 8)) in Panel B, for several values of a and f. Be- 
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cause of the symmetry of the normal distribution, Panel B is obtained 
from Panel A by interchanging a and £. 

As can be seen from the table, for the range of a and 8 from .01 to 
.05 (the range most frequently employed), the sequential sampling 
leads to an average saving of at least 47 percent in the necessary num- 
ber of observations as compared with the single sampling method. If 
6<6, or 0>40, the saving is even greater than the percentage shown 
in Table 1. 


5. DERIVATION OF THE OC CURVE FOR ANY SEQUENTIAL 
PROBABILITY RATIO SAMPLING PLAN 





In Section 3.4 of [1] an approximation formula for the OC curve 
has been derived, neglecting the excess of the probability ratio pim/Dom 
over the boundaries A and B at the termination of the sampling proc- 
ess.? In addition to the approximation formula, lower and upper limits 
for the OC curve were also derived in Section 3.4 of [1]. Here we shall 
merely outline a derivation of the approximation formula for the OC 
curve, neglecting the excess of the probability ratio pin/Pom over the 
boundaries A and B. 

Consider the expression 


(5.1) F = =|" 
f(z, 60) 


where A(@) is determined so that h(@)~0 and the expected value of the 
expression (5.1) (when @ is the true value of the parameter) is equal to 
1, i.e.* 


(5.2) E fe, | 








f(z, oy = 1 
f(x, %)J 


It was shown elsewhere (see for instance Lemma 2 in [2]) that under 
some slight restrictions on f(z, @) for any value @ there exists exactly one 
real value h(@)~0 such that (5.2) holds. 

Thus, for any given value @, the function 


f(x, AYO 
(5.3) f*(z, 0) = f(z, 6) Fad 


2 For the special case of a binomial distribution the OC curve was first derived by C. M. Stockman. 
Independently of Stockman, the same result was also obtained by G. W. Brown and M. Friedman 
(independently of each other). 

8 If the distribution of z is continuous then equation (5.2) is to be replaced by 


2 f(x, ) VO 
I(x, ®) dz=1. 
-« (zx, 09) 
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is a distribution function. 
We shall first consider the case when h(@) >0. Since 


f(a, ) | BS a 
f(z, 0) Lila, oJ” 


the sequential probability ratio plan corresponding to A, B, f(z, 6), 
and f(z, 9) is identical with the sequential probability ratio plan corre- 
sponding to A*®, B*®, f*(x, 6) and f(z, @). Our aim is to determine the 
value of L(@), i.e., the probability that action 1 will be taken when 
f(z, ®) is the distribution of z. Applying our results in Section 4.2 to 
the sequential sampling plan corresponding to A*®, B*®, f*(z, 6), 
f(z, 6), we see that the following equations hold with good approxima- 
tion: 





(5.4) AM® = 





(5.5) BMO = 





Here a’ denotes the probability that action 2 will be taken when f(z, @) 
is the true distribution of z, and f’ is the probability that action 1 will 
be taken when f*(z, 6) is the true distribution of z. From equations 
(5.4) and (5.5) we obtain 





1 — Br 
(5.6) a = AMO — Bo ” 
Since a’ = 1—L(@), we get 
Ah — | 
(5.7) L(6) = Ah) — Bre 


It can easily be shown that (5.7) remains valid also when A(@) <0. 
Thus, the OC curve is given by the equation (5.7). Applications of the 
formula to various special cases will be given in Sections 7, 8, and 9. 


6. DERIVATION OF THE ASN CURVE FOR ANY SEQUENTIAL 
PROBABILITY RATIO SAMPLING PLAN 


Denote by n the number of observations required by the sequential 
probability ratio sampling plan. Let N be a sufficiently large integer 
so that the probability that n= WN is very small and can be neglected.‘ 


‘ For a rigorous and detailed discussion see Section 4.1 in [1]. 











292 
Thus, we shall assume that n<N. Then we can write 
(6.1) ate: few = (1+ ++ + en) + (engua +++ + ey) 
where Z.=log f(a, 61)/f(a, 90). Taking expected value on both sides, 
we obtain 
(6.2) NEz = E(a + +++ +20) + Elengi + +++ + 2n) 
where z= log f(z, 01)/f(x, 0). Since for a>n the variate z, is distributed 
independently of n, we have 
(6.3) E(én4i +--+ + 2y) = E(N — n) Ez = NEz — EnkEz. 
From (6.2) and (6.3) it follows that 
(6.4) E(a, + +++ + 2,) — EnEz=0. 
Hence 
= E(a+-:-:: + 2) 
Ez 

Assume that f(z, 6) is the true distribution of z. Then En=E,(n). 
Denote by E,(z) the expected value of z when f(z, @) is the distribution 
function of xz. Neglecting the excess of the probability ratio Dim/Dom 
over the boundaries A and B, the variate (z:+ - - - +z,) can take only 
the values log A and log B with probabilities 1—Z(@) and L(@), respec- 
tively. Hence 
(6.6) E(at--++ +2.) = L(@) log B+ (1 — L(@)) log A. 
From (6.5) and (6.6) we obtain the final formula 
L(6) log B + (1 — L(@)) log A 

E;,(z) 





(6.5) En 





(6.7) Ey(n) = 


This is, of course, only an approximation formula, since the excess 
of the probability ratio pimn/pom over the boundaries A and B has been 
neglected. Upper and lower limits for E,(n) are given in Section 4.1 of 
[1]. Applications of the formula (6.7) to various special cases will be 
given in Sections 7, 8, and 9. 


7. APPLICATION TO SAMPLING INSPECTION WHEN THE RESULT OF A SINGLE 
OBSERVATION IS A CLASSIFICATION INTO ONE OF TWO CATEGORIES 
7.1 Statement of the Problem 


The situation where the result of an observation is a classification 
into one of two categories arises frequently in acceptance inspection of 
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manufactured products, since often each unit inspected is classified into 
one of the two categories, non-defective and defective. We shall con- 
sider the problem that a lot containing a large number of units is sub- 
mitted for acceptance inspection. The number of units in the lot is as- 
sumed to be so large that we may treat the lot as containing infinitely 
many units. Denote by @ the unknown proportion of defectives in the 
lot. It will be possible to determine two values of 0, 6) and 6; (@.<4;), 
such that acceptance of the ‘Ict is considered an error of practical im- 
portance whenever @2 6:, and rejection of the lot is considered an error 
of practical importance whenever @< 4. Suppose that we want a sam- 
pling inspection plan such that the probability of rejecting the lot 
should not exceed a preassigned value a whenever @ S 4», and the proba- 
bility of accepting the lot should not exceed a preassigned value 8 when- 
ever 6206;. Our problem is to find a proper sampling plan satisfying 
these conditions. 
7.2 The Sequential Probability Ratio Sampling Plan 

We have seen that a good sampling plan satisfying our requirements 
is the sequential probability ratio plan corresponding to the quantities 
6, 6:1, a and 8. According to formulas (4.4), (4.5), and (4.6), this sam- 
pling plan is given as follows: Denote by d, the number of defective 
units found in the first m units inspected. Then the probability of ob- 
taining a sample equal to the observed is given by 


(7.1) Dim = 02=(1 — 6,)"-4 

when @=6;, and by 

(7.2) Pom = Go¢(1 — 60)"-* 

when @= 6. We continue inspecting additional units as long as 


(7.3) Beeca 


Pom 


Inspection is terminated with the acceptance of the lot if for some value 
of m we have 
(7.4) 8. 

Pom 


Inspection is terminated with the rejection of the lot if for some value 
of m we have 


(7.5) —2A. 
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According to the results obtained in Section 4.2, approximate values 
of A and B are given by (1—8)/a and B/(1—a), respectively. 

For practical computations it is useful to rewrite the inequalities 
(7.3), (7.4), and (7.5) in a somewhat different form. Taking logarithms 
on both sides, one can easily see that these inequalities can be rewritten 
as follows: 




































































' ' 1 — % 
Oo Oo 
Jer ei -«4 
(7.6) +m < dn 
og — — log — og — — log 
0 1 — 0 Oo 1 — % 
1-8 1 — 6 
re) O 
a Qa rer 
< +m 
, 6, 1-4 ' 6; 1— 4 
og — — lo og — — log ——— 
*. “1-4 ar? "1 <@ 
eek. 
O 0 
ak er eT e 
Ta) & & +m 
6; 1 — A; 0; l1— A 
log — — log ——— log — — log 
A 1 — 4% 0 — A 
and 
1-8 1 — 6 
log log 
a 1— 
(7.8) daz +m 
6; 1— A A; 1 — 
og — — lo og — — lo 
ay oi kh ary eit 


On the basis of inequalities (7.6), (7.7), and (7.8) the inspection plan 
may be carried out as follows: For each value of m we determine the accep- 
tance number 






































I 1 — 4% 
10g ty) 
1 _— & . 1 — 6; 
(79) A. = +m 
I 6; i 1— 4 A; 1— 4; 
og — — lo og — — lo 
ay eit at ela 
and the rejection number 
1-8 1 — 4% 
log log 
a 1 — 4, 
7.10) Ra = +m 
1 6; 1 1 — 6; 1 6; 1 — 6; 
og — — lo — og — — lo 
ary "ik a "1-4 


These acceptance and rejection numbers are best computed before inspec- 
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tion starts. If A, is not an integer, we replace it by the largest integer 
<Am. Similarly, if R» is not an integer, we replace it by the smallest in- 
teger which is >R,,. We continue inspecting additional units as long as 
Am<dm<Rn. If for some value m we have d, SAm, inspection is termi- 
nated with the acceptance of the lot. If for some value of m we have d, = Rm, 
inspection is terminated with the rejection of the lot. 


7.3 A Numerical Example 
Let 09=.10, 6,;=.30, a=.02, and B=.03. The acceptance and rejec- 
tion numbers, as well as the results of the observations in an experiment 
are given in the following table. 
TABLE 2 








m | Am dm Ru 
Number of units | Acceptance Number of defects Rejection 
inspected number observed number 
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Thus, the sampling inspection terminates at m= 22 with the rejection 
of the lot. 


7.4 The OC Curve of the Plan 
According to equation (5.7) the OC curve is given by the equation 


1-8 ns 
Ah® — } @ 


(7.11) L(6) ~ a= 


Ah — Bree) 1 — p\r B \h@ 
( a ) z (; _ -) 


where L(@) is the probability of accepting the lot when @ is the true 
fraction of defectives, and h(@) is the root of equation (5.2). Since in 
our case f(z, 6) =@ when z=1, and f(z, 0)=1—86 when x=0, equation 
(5.2) can be written as 


6, \h@ 1 — 6\* 
7.0 6, — 1 — @ =], 
(7.12) (~) +( \(- = 5) 




















To plot the OC curve, it is not necessary to solve equation (7.12) 
with respect to h(@). We may consider h as a parameter and solve (7.12) 
with respect to 6. Then we obtain 


(: — 6,\' 
a, 
i-@ 
a -Goa 
80 i-& 


and (7.11) can be written as 
1 — B\* 
wake 
a 


ein ) 

a l-—a 
For any arbitrarily chosen value h the point (@, L(@)) computed from 
(7.13) and (7.14) will be a point on the OC curve. The OC curve can 


be drawn by plotting a sufficiently large number of points (6, L(@)) cor- 
responding to various values of A. 








(7.13) § = 








(7.14) L(6) ~- 











7.5 The ASN Curve of the Plan 


In Section 6 we have derived the following approximation formula 
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for the average number of observations required by the sequential prob- 
ability ratio plan. 





1 
+ (1 — L(6)) log 





L(6) log ; 


—= ¢ 





(7.15) Eon) Bue) 
where z=log f(x, 6:)/f(x, 00) and E,(z) denotes the expected value of z 
when f(z, @) is the distribution function of z. In our case f(z, 0) =0 
when z=1, and f(z, 0)=1—86 when x=0. Thus, z can take only the 
values log 61/40 and log 1—6:/1—6@ with probabilities @ and 1—48, re- 
spectively. Hence 


1— 4 


6 
(7.16) E,(z) = 8 log a + (1 — 6) log - 
0 





’ 
—— a7 


and consequently 


+ (1 — L(6)) log too 
Qa 





L(@) log ; 





(7.17) Ee(n)~ 

















6; 1l-@ 
6 log — + (1 — @) log 
Ao 1— 0 
E 9 (n) 
TYPICAL ASN CURVE OF THE SEQUENTIAL 
PLAN DISCUSSED IN SECTION 7. 
.@) Qo 6; 1 a 


FIGURE 3 
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8. SEQUENTIAL ANALYSIS OF DOUBLE DICHOTOMIES 


8.1 Formulation of the Problem 


Consider two binomial distributions. Denote by p: the probability 
of success in a single trial according to the first binomial distribution, 
and by pz the probability of success in a single trial according to the 
second binomial distribution. These probabilities p; and p: are assumed 
to be unknown. Suppose that two alternative courses of action can be 
taken, say action 1 and action 2, and that action 1 is preferred when 
P1> pe, and action 2 is preferred when p:<pe2. We shall consider the 
case where the observations are taken in pairs, each pair containing an 
observation from the first and an observation from the second binomial 
distribution. The problem considered here is to devise a suitable sam- 
pling plan for deciding between the two courses of action. 

Such a problem will arise, for example, if we want to compare the 
effectiveness of two production processes where effectiveness of a pro- 
duction process is measured in terms of the proportion of effective units 
in the sequence produced. A unit may be called effective if it has a 
certain desirable property, for example, if it shows sufficient resistance 
against pressure. Let p: denote the proportion of effectives in produc- 
tion process 1 and pe the proportion of effectives in process 2. Then 
these two production processes represent two binomial distributions. 
The probability of success in a single trial (the probability that a unit 
drawn at random will be effective) is equal to p; for process 1 and is 
equal to pe for process 2. Suppose that the manufacturer does not know 
the values p; and pe, and that he has to decide whether he should adopt 
process 1 or process 2. Assuming that the manufacturer prefers process 
1 when p:> pe and prefers process 2 when pi < pz, he is faced with the 
problem of devising a suitable sampling plan to decide which produc- 
tion process should be adopted. 


8.2 Risks that we are Willing to Tolerate of Making Wrong Decisions 


The efficiency of the production process 1 may be measured by the 
ratio of effectives to ineffectives produced, i.e., by ki=pi/(1—p.). 
Similarly, the efficiency of the production process 2 may be measured 
by ke=p2/(1—pe2). The ratio u=k2/k, can then be regarded as a reason- 
able measure of the superiority of process 2 over process 1. In general, 
the manufacturer will be able to select two values of u, say uo and ui 
(uo <u), such that the adoption of process 1 is considered an error of 
practical importance whenever u2 1, and the adoption of process 2 is 
considered an error of practical importance whenever uSwo. If u lies 
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between Uo and u; the manufacturer does not care particularly which 
process is adopted. The value up need not be $1. It may be chosen >1 
if production process 1 is in operation and transition to process 2 is 
very costly. After the quantities uw» and u; have been chosen, the risks 
that we are willing to tolerate may reasonably be expressed as follows: 
The probability of adopting process 2 should be less than or equal to a 
preassigned value a whenever u Sup, and the probability of adopting 
process 1 should be less than or equal to a preassigned value 8 whenever 
uu. Thus, the risks that we are willing to tolerate are characterized 
by four quantities, wo, uw, a, and B. 


8.3 The Sequential Probability Ratio Sampling Plan 


We shall denote an effective unit by 1, and an ineffective unit by 0. 
According to our assumption the observations are made in pairs, where 
each pair contains an observation from process 1 and an observation 
from process 2. Since each observation can take only one of the values 
0 and 1, any pair (a, b) must take one of the four values (0, 0), (1, 1), 
(0, 1), and (1, 0). For the purpose of deciding between the two processes 
we shall merely consider the pairs (a, b) which are equal either to (0, 1) 
or to (1, 0), disregarding the pairs (0, 0) and (1, 1). Intuitively it seems 
plausible that the pairs (0, 0) and (1, 1) cannot give much information 
as to the superiority of one process over the other, and, therefore, such 
pairs may be disregarded. A more formal justification of this was given 
in Section 5.3.3 of [1]. 

The probability of obtaining a pair (0, 1) is equal to (1—p:)pe and 
the probability of obtaining a pair (1, 0) is equal to pi(1—pe). Hence, 
knowing that the observed pair (a, b) is equal to one of the pairs (0, 1) 
and (1, 0), the (conditional) probability that it is equal to (0, 1) is 
given by 


(8.1) 


” (1 — pi)p2 
Pill — p2) + po(l — pr) 
and the (conditional) probability that it is equal to (1, 0) is given by 





pi(l — pr) 
pi(l — po) + po(l — pr) 


Denote by ¢; the number of pairs (1, 0) and by & the number of 
pairs (0, 1) in the observed sequence of pairs. Then ¢ is distributed 
like the number of successes in a sequence of t=t,+¢ independent 
trials, the probability of a success in a single trial being equal to @. 

Our requirements concerning the risks that we are willing to tolerate 





(8.2) 1-6= 
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can easily be expressed in terms of @ instead of wu, since @ is a simple 
function of u. In fact, 
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(1 — pi)pe 
(83) 0= (1 — p1)pe " pi(l — pr) _ 4% 
pi(l — po) + po(l — pr) 14 po(l — px) 1+u 
pi(l — pe) 


Hence, the risks that can be tolerated may be expressed as follows: 
the probability of adopting process 2 should be less than or equal to a 
whenever 6 S 6) = uo/(1+ uo), and the probability of adopting process 1 
should be less than or equal to 8 whenever @2 4 = u:/(1+ 41). 

It is now clear that the sequential probability ratio plan for our prob- 
lem can be obtained from that given in Section 7.2 by making the fol- 
lowing substitutions: The number m of observations in Section 7.2 is 
to be replaced by the number ¢ of pairs (0, 1) and (1, 0) observed. The 
number d,, of defects in Section 7.2 is to be replaced by t (number 
of pairs (0, 1) observed). Furthermore, we have to put 6;=u:/(1+4;) 
and 6) =uo/(1-+-uo). Thus, for each value of t the acceptance number is 
given by 

B 1+ ww 
log 


l-—a 1 -+- Uo 
(8.4) A; == 
log uw — log uo log u; — log uo 








log 








and the rejection number is given by 











1 -— B 1 + U1 
log log 
a 1 + Uo 
(8.5) R; = ° 
log uw — log uo log ui — log uo 


The acceptance and rejection numbers are best tabulated before ex- 
perimentation starts. The sampling plan is then carried out as follows: 
Observations are taken in pairs where each pair contains an observation 
from process 1 and an observation from process 2. We continue taking 
observations as long as A;<t2< R;. If t= R:, we terminate the sampling 
procedure with the adoption of process 2. If <A:, we terminate the 
sampling procedure with the adoption of process 1. 


8.4 A Numerical Example 

Let uo =1.3, wi =3, a=.03, and 8=.10. The observed pairs (0, 1) and 
(1, 0) in an experiment, and the rejection and acceptance numbers are 
given in the following table. 
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N 
TABLE 3 
le = 
t . ts 
Number of pairs (0, end 0) po Number of pairs eels 
(0, a observed number a d number 
1 (0, 1) —_ 1 _ 
2 (0, 1) — 2 _ 
3 (1, 0) _ 2 _ 
4 (1, 0) -- 2 _— 
5, (1, 0) 0 2 _ 
:: 6 (0, 1) 1 3 _ 
7 (1, 0) 1 3 _ 
a 8 (0, 1) 2 4 _ 
| 9 (0, 1) 3 5 _ 
10 (1, 0) 3 5 _ 
ll (0, 1) 4 6 _ 
" 12 (0, 1) 5 7 _ 
13 (0, 1) 5 s 13 
- 14 (1, 0) 6 8 14 
s 15 (1, 0) 7 8 14 
| 16 (0, 1) 7 9 15 
e 17 (1, 0) 8 9 16 
r 18 (1, 0) 9 Q 16 
19 9 17 
) 20 10 18 
g 21 ll 18 
22 ll 19 
23 12 20 
24 13 20 
25 13 21 
26 14 22 
27 15 22 
28 15 23 
29 16 24 

















Thus, the sampling process is terminated at t= 18 with the adoption 
of process 1. 


8.5 The OC Curve of the Plan 


Denote by L(u) the probability that process 1 will be adopted when 
the ratio k2/k; has the value u. An approximation to the value of L(u) 
can be obtained from (7.13) and (7.14) by substituting uo/(1+ uo) for 
90, ui/(1-+%) for 6, and u/(1+ 4) for 6. Making these substitutions we 
have 


1 (- a =) 
(8.6) u _ 1+ w 


l+u (e + “ (- + “)) 
vo(1 + U1) 1 + U4 








and 
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(8.7) L(u) ~ 








wie =] 
a l-—a 
For any given value h we can compute u and L(u) from these equations, 


The OC curve can be drawn by plotting the points (u, Z(u)) for a suffi- 
ciently large number of values of h. 


8.6 The ASN Curve of the Plan 


For any value u of the ratio ke/k, the expected value of the total 
number of pairs (0, 1) and (1, 0) can be obtained from (7.17) by sub- 
stituting E(t) for Ee(n), L(u) for L(@), w/(1+m) for 0, wo/(1+ 9) 
for 0, and u/(1+ 4) for 6. Thus 


hed 





+ (1 — L(u)) log 





L(u) log ; c 


—_— @ 





8.8 E(t) ~ 
ae @) ui(1 + wo) 1 ' 1 + wo 


oO — 
1+ u e Ll + ica" tae 








The expected value of the total number of pairs (including also the 
pairs (0, 0) and (1, 1)) can be obtained by dividing the right-hand mem- 
ber in (8.8) by ~1(1 — pe) +p2(1—71). 


9. APPLICATION TO SAMPLING INSPECTION WHEN THE RESULT OF AN 
OBSERVATION IS A NORMALLY DISTRIBUTED VARIATE 
WITH KNOWN STANDARD DEVIATION 


9.1 Formulation of the Problem 


In Section 7 we have considered the case that a unit of a manufac- 
tured product is classified into one of the two categories, defective and 
non-defective. Now we shall assume that the result of an observation is 
a measurement z of some characteristic of the unit, such as the weight, 
or diameter, or hardness, etc. Suppose that a lot containing a large 
number of units is submitted for acceptance inspection. The number 
of units in the lot is assumed to be sufficiently large so that the lot may 
be treated as containing infinitely many units. The value of z will, in 
general, vary from unit to unit, and it is assumed that the distribution 
of z in the lot is normal with known standard deviation ¢ and unknown 
mean 6. We shall assume that the preference for acceptance or rejection 
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of the lot depends on the value of @ in a manner as described in Section 
3. Thus, there exists a value 6’ such that acceptance is preferred when 
§<6’, rejection is preferred when @> 6’, and there is no preference for 
either action when @=@’. Furthermore, the preference for acceptance 
increases with decreasing value of @ over the domain @<6@’, and the 
preference for rejection increases with increasing @ over the domain 
> 06’. Suppose that two values of @ are selected, 0) and 6; (69 <4), and 
a sampling plan is required such that the probability of rejecting the 
lot is less than or equal to a preassigned value a whenever @S 4, and 
the probability of accepting the lot is less than or equal to a preassigned 
value 8 whenever @2 0. 


9.2 The Sequential Probability Ratio Sampling Plan 
The distribution (probability density) of z is given by 





(9.1) f(z, 2) = i em (1/207) (2-0)? 
/2ro 


We shall denote the successive observations on z by 2%, 22, +--+, ete. 
Then 


q l ot) © 2 

(9.2) Pim = ma e- (1/207) _ 

and 

(9.3) Pom = a e- (1/207) Yet 
(20)"/20™ a=1 


According to formulas (4.4), (4.5), and (4.6) the sequential proba- 
bility ratio plan is given as follows: We take additional observations 
as long as 


m 

2 2 

e7 1/20") > (ta—61) 
a=1 


= 
e~(1/20*) S™ (za—t0)? 


a=] 


<A. 





(9.4) B< 


Inspection is terminated with the acceptance of the lot if 


m 
e- (1/207) >> =)? 
(9.5) = 


m 
e7 (1/207) > (e=-#0)? 


a=l 





B. 


lA 
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Inspection is terminated with the rejection of the lot if 


m 
e~ (1/207) }* (za~01)? 


a=] 


™m 
e~ (1/207) ° (za-00)? 


a=1 





IV 


(9.6) A. 


According to the results in Section 4.2, approximate values of A and 
B are given by (1—8)/a and 6/(1—a), respectively. 

Taking logarithms on both sides, one can easily verify that the in- 
equalities (9.4), (9.5), and (9.6) can be written as follows: 























a B ‘FE Oy + 0 
log + m —— 
6; —_ A 1 = 2 2 
(9.7) 
“ o? 1 -= Ao + 6; 
<> 2< log + m— 
a=1 6; — % a 
= 2 65 + 0 
(9.8) > 2. : log : $$ tmnnm 
a=] 0; >a 1 = @& a“ 
and 
” o? 1-8 05 + 0 
(9.9) >~z2 log +m——— - 
a=1 61 — 9% a 2 


Using the above inequalities, the inspection plan may be carried out 
as follows: For each m compute the acceptance number 
o B 00 + A; 


lo +m 
6; — 4% af er 2 








(9.10) An = 


and the rejection number 


o? 1-8 65 + A 


9.11 Rk, = lo +m 
( ) A; = A . Qa 2 








These acceptance and rejection numbers can be computed before in- 
m 
spection starts. Continue inspection as long as An<)>_%a<R». Accept 


m m a=!1 
the lot if > 2. <A,,, and reject the lot if > 2a =>Ra. 


a=1 a=1 


9.3 A Numerical Example 
Let 0) = 135, 6:=150, a=.01, and 8=.03. Furthermore, let the stand- 


ard deviation o be equal to 25. The observations and the acceptance 
and rejection numbers are tabulated in the following table. 
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TABLE 4 
m Am z =z Rm 
Number of Acceptance Observed Cumulated sum of Rejection 
observations number value observed values number 

1 _ 151 151 334 
2 139 144 295 476 
3 281 121 416 619 
4 424 137 553 761 

5 566 138 691 OO4 
6 709 136 827 1046 

7 851 155 982 1189 

8 994 160 1142 1331 
9 1136 144 1286 1474 
10 1279 145 1431 1616 
11 1421 130 1561 1759 
12 1564 120 1681 1901 
13 1706 104 1785 2044 
14 1849 140 1925 2186 
15 1991 125 2050 2329 
16 2134 106 2156 2471 
17 2276 145 2301 2614 
18 2419 123 2424 2756 
19 2561 138 2562 2899 
20 2704 108 2670 3041 
21 2846 3184 
22 2989 3326 
23 3131 3469 
2 3274 3611 
25 3416 3754 
26 3559 3896 
2 3701 4039 
28 3844 4181 
29 3986 4324 
30 4129 4466 
31 271 4609 
32 4414 4751 
33 4556 4894 
34 4699 5036 
35 4841 5179 

















Thus, the sampling inspection is terminated at m=20 with the ac- 
ceptance of the lot. 


9.4 The OC Curve of the Plan 


Let L(@) be the probability that the lot will be accepted when the 
mean of z is 0. If (@:—60)/e is fairly small, a good approximation to 
L(6@) is given by formula (5.7), i.e., 


1— B\'*® 
‘cw ie 
a 


(9.12) 





L(6) ~ 


ey"-( 


B h(@) 
1 — -) 
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where h(@) is the non-zero root of the equation we obtain from (5.2) by 
substituting the right-hand member of (9.1) for f(z, 6). It can be shown 
that this root is given by 


6; + 0 — 20 
A; — % 





(9.13) h(0) = 
The OC curve is given by (9.12), substituting the right-hand member 
of (9.13) for h(@). 

9.5 The ASN Curve 


According to formula (6.7), a good approximation to the expected 
value of the number of observations required by the plan is given by 








L(6) log " B 


— = 


+ (1 — L(6)) log 





(9.14) E,(n) ~ 
E;(z) 

where z=log f(z, 0:)/f(x, 00) and Ee(z) denotes the expected value of z 

when @ is the true mean value of z. Since f(z, @) is given by(9.1),we 

find that 








1 
(9.15) z = — [2(0, — 0o)x + Oo? — 6,7] 
20? 
and 
1 
(9.16) E,(z) = — [002 — 02 + 2(0; — 60)@]. 
20? 
Hence 
B _ 
L(@) log ; + (1 — L(@)) log 
7 @ 


(9.17) Eo(n) ~ 20? 





O02 — 0:2 + 2(0, — O)0 
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ON TRAINING IN SAMPLING* 


By W. Epwarps DEMING 
Bureau of the Budget 


SAMPLING CONTRASTED WITH OTHER PARTIAL INVESTIGATIONS 


in the planning of surveys, particularly so far as government 
statistical practice is concerned. Something useful should be accom- 
plished by setting forth the kind of responsibilities that the statistician 
must be prepared to face. The result will not be a paper on methodology 
nor a list of courses that one ought to take or be able to pass, but in- 
stead will be a statement of certain end-criteria which training in 
statistics should attempt to meet. 

Not all partial investigations should be samples. Whether to use 
sampling or some other approach (quota method, judgment selection, 
cut-off, incomplete list, voluntary response) is a question that should 
be answered by asking which method will give the most useful informa- 
tion for the money. The answer is going to vary with the nature of the 
enquiry and the precision demanded. As part of a broad program in 
training in sampling, statisticians ought to assume responsibilities in 
clarifying to the administrators whom they serve and to their scientific 
colleagues in economics, sociology, retail trade, psychology, public 
opinion, engineering, and other professions, certain characteristics of a 
professional job of sampling and certain important distinctions between 
sampling and other partial investigations. Appreciation for good sample 
design can be acquired without becoming an expert in sampling, just as 
appreciation for good music can be developed without learning to play 
an instrument or learning to write harmony or counterpoint. 

Characteristics of a professional job of sampling. A professional job of 
sampling bears three marks: 

i. The selection of the respondents is automatic. 


Hi: I SHALL COMMENT only on some basic principles of efficiency 


That is, the method of selection is embodied in the sampling plans and 
thereafter is not influenced by the interviewer’s judgment or anyone 
else’s. Neither is respect paid to the respondent’s wishes to be in or out 
of the sample. Call-backs on a subsample! of those people not at home 


* A paper presented at the 104th Annual Meeting of the American Statistical Association, Wash- 
ington, December 27, 1944. 

1 My colleague William N. Hurwitz has found a simple but important mathematical solution to 
the problem of what percentage of the non-respondents of a mailed questionnaire should be followed up 
by personal interview, in order to obtain the greatest possible amount of information for a given ex- 
penditure. The important desideratum is the ratio of the cost of a mailed questionnaire to a personal 
call. Mr. Hurwitz’s solution was expounded March 15, 1943, at one of the Seminars in Sampling and 
Statistical Inference at the Graduate School of the Department of Agriculture. It was later published 
in the “Working plan for the annual census of lumber produced in 1943,” which is obtainable from 
either the Census or the Forest Service. The results are being put to use in more and more government 
surveys, with consequent savings and increase in reliability. 
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at the first call must be made until results are obtained. In continuing 
surveys covering identical units, some method of substitution must be 
worked out for units that cease to exist. If a mailed questionnaire is 
used, there must be unrelenting follow-up by mail, telephone, and tele- 
graph; and finally, if necessary, by personal interview of a subsample! 
of those unmoved by the less expensive devices. Every element of the 
universe (i.e. every respondent) has a chance of being included in the 
sample, and by the use of tested procedures of selection involving ran- 
dom numbers or systematic rules, these chances have known numerical 
values which are identified as actual relative frequencies in repeated 
trials. These chances are used in the computation procedure (Point ii) 
and in the calculation of the risks (Point iii). 


It hardly need be added, but the questionnaire or other method of ob- 
taining the response (such as a mechanical or electrical contrivance) 
should not be so complicated or so fascinating that it invalidates the 
results, regardless of how the respondents are selected. 


ii. The procedure for processing the data and computing the 
characteristics of the universe is laid out in advance as part of the 
sampling plan. 


Actually, there may be laid out in advance not just one but several 
possible computing procedures (such as straight weighting, ratio- 
estimates, adjustment to known marginal totals, or other devices). 
These will vary in cost and time, and will produce different degrees of 
precision; and a final decision on which one to use can be deferred and 
often ought to be deferred pending the outcome of final appropriation 
of funds, time allowable, personnel and machines available at the time 
the data come in. It is also permissible to defer a final decision on the 
computing-procedure pending the outcome of certain studies of vari- 
ances which will determine the precisions to be obtained by various 
procedures of computation. The decision on how to do the computing is 
not based on judgment as to whether the data regarding some character- 
istic seem to be high or low. 


iii. There exists an error-formula for calculating the precision of 

the results. 
Corresponding to every sample design, there is, in principle at least, 
an error-formula. If the error-formula is known, the risk of encountering 
a sampling error of a postulated magnitude (2, 5, 15 per cent) can be 
computed and made as small as desired by taking a big enough sample 
and increasing the cost accordingly. If the error-formula overtaxes the 
mathematician’s ability and can not be written down, the design will 
still be usable if its precision is known to exceed the precision of some 
other design whose error-formuls is known. 


True enough, the error-formula will contain one or more constants? 


2 These constants are nearly always some of the totals, averages, proportions, variances and co- 
variances which characterize not only different segments of the universe that is to be sampled but the 
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for which exact values cannot be stated in advance, but the existence 
of the formula points unmistakably to a job of sampling; its existence 
implies automatic selection of the respondents and a pre-detailed pro- 
cedure for processing the data. In spite of the fact that the constants 
in the error-formula are not known as precisely before the survey is 
taken as they are afterward, it is nevertheless almost always possible, 
by intuition and knowledge of the universe and the subject-matter 
being studied, to arrive at suitable numerical values in advance, and 
thus to make approximate calculations of the width of the error band 
that corresponds to a certain size of sample. It is in fact by such calcu- 
lations that the required size of sample is decided upon; it must be 
big enough to provide the precision required, yet not too big (cf. the 
next part), else it would be too expensive and time-consuming. 

After the survey is taken, a new and more exact evaluation of the 
constants in the error-formula should be made in the processing of the 
data. This is done because one cannot be content to rest upon the 
assumption that the procedures in the field were carried out as he in- 
tended, or that the universe was what he assumed it to be in every 
respect. The published tables should show what the sampling errors 
were as evaluated from the returns—not what they ought to have been, 
but what they actually were.’ Evaluation of the returns serves another 
useful purpose by disclosing ways in which the instructions can be im- 
proved in future surveys to circumvent difficulties not hitherto appre- 
ciated. 

Some obligations of the statistician. In spite of the inclusion of courses 
in statistics in the curricula of economics, sociology, marketing, agri- 
culture, biology, and elsewhere, it is not widely enough recognized that 
in a professional job of sampling the risks, being controllable and known 
in advance, are tailored to the requirements of precision with efficient 
use of funds and facilities. This important feature of sampling needs 
to be taught by statisticians to their professional colleagues in other 
fields. In contrast, when the respondents are picked up by quota 
selection or expert selection of “representative” counties, farms, or 
firms, too frequently no useful limits of bias can be assigned: the in- 
formation furnished is then of unknown quality and may lead to ex- 
tremely embarrassing questions of interpretation. Unless biases can be 
removed satisfactorily a method of collection that appears to be cheap 
is too often cheap only in the sense of providing a lot of schedules per 





performance of the fieid-workers as well. These things are of course not known as well before the survey 
is taken as afterward, else there would be no use taking the survey. 

+ For an example, see the Census publications of the sample censuses of Congested Production 
Areas, 1944, 
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dollar, but may actually be very costly when measured in the amount 
of useful information per dollar or the damage done through misinfor- 
mation. 

The statistician in designing a sample makes use of all the judgment 
and specialized knowledge regarding the universe that can be mustered, 
but he uses them in such manner that they are effective in providing 
information that can be interpreted—i.e., information possessing cal- 
culable risks. He puts the judgment of experts to use in determining 
the best modes of stratification, the best kind of sampling unit, in 
deciding how to pair and enlarge the sampling units to gain hetero- 
geneity and smaller sampling errors without increasing the cost, and in 
arriving at rational and usable values for the variances, covariances, 
and other constants in the error-formula by which the size of sample is 
tailored to the requirements and by which the extent of usable cross- 
classification in the tabulation-program is decided upon in advance. 

On the other hand, the statistician in considering plans for a survey 
is expected to know when a judgment-selection of respondents will 
suffice. Sometimes he can quickly pick out a small number of farms or 
cities in which conditions are so greatly in contrast that low-cost 
surveys therein will prove or disprove a hypothesis and solve the prob- 
lem. Not all partial investigations need to be sampling jobs, but 
it is exceedingly important to recognize and publicize the distinc- 
tion between the inferences that can be drawn from samples and from 
other kinds of partial investigations. Sampling requires additional effort 
and skill of a specialized kind, and this skill is not always obtainable 
nor worth the extra cost. In many circumstances, however, this extra 
effort and cost pay big dividends and render other procedures obsolete. 

Need for studies in the removal of biases. Studies in the removal of the 
biases that arise in cheap methods of collecting data (such as the 
omission of “call backs” on families not at home at first call; the use of 
return-cards distributed by postal carriers; data obtained from paid 
respondents; mailed questionnaires; the “cut-off”; incomplete lists of 
firms that could be perfected only at prohibitive cost) are worthy of the 
best efforts of statisticians.‘ However complicated the theory of sam- 
pling may appear to be, the problem of predicting biases is by com- 
parison much more complicated and less developed. There are yet no 
theories of bias in any way comparable to theories of sampling. There 
is an open field here for the joint efforts of statistician and psychologist. 


4 This thought has been voiced many times by my colleague Frederick F. Stephan, and by M. G. 
Kendall in his excellent article “On the future of statistics,” Journal of the Royal Statistical Society, 
vol. cv, 1942: pp. 69-91. 
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An example of a quantitative and usable research into the corrections 
that must be applied to eliminate the biases that will be introduced 
into the figures obtained on average size of family, number of refrig- 
erators in use, washing machines, and other family characteristics, 
through failure to make second and third calls on families not at home 
at first call, was published by Hilgard and Payne.‘ It should be possible 
to capitalize on a number of other experimental studies of bias, such as 
those that have been carried out for many years by the Department of 
Agriculture, as a result of which it is now possible to make accurate 
estimates of crop and livestock production from mailed enquiries in 
spite of a high degree of selectivity. This is possible through the use of 
control data provided by the federal census and other complete enumer- 
ations of agricultural items. Biases in these mailed enquiries usually 
arise from two main sources. First, the farms reporting are not repre- 
sentative of the universe of all farms. Second, there is understatement 
or overstatement by the respondents, deliberate or unintentional, but 
fortunately fairly constant with respect to a particular group of re- 
spondents. When returns can be obtained from approximately the 
same group of respondents in successive surveys, and if check data are 
available periodically, it is possible to measure the biases and to sub- 
tract their effects. 

Articles by Gladys L. Palmer‘ on the bias and variability of response 
in population and labor-force enquiries direct the reader to other 
studies in those fields. 


SOME ECONOMICS OF SAMPLE DESIGN 


Guiding principles in the economics of sample design. The statistician 
does not take chances with sampling errors; he has them under control 
and knows how much it will cost to reduce them to any desired degree. 
He knows that precision beyond what can actually be utilized in formu- 
lating action on the basis of the data is sheer waste of money. His 
guiding philosophy is a very practical one, namely, to minimize, in 
the long run, the net losses arising from two kinds of mistakes— 


i. Trying to cut corners by taking too small? a sample, or a sample 


5 Ernest R. Hilgard and Stanley L. Payne, Public Opinion Quarterly, vol. 8, Summer 1944: pp. 
254-261. 

* Gladys L. Palmer, “The reliability of response in labor-market inquiries,” Technicai Paper No. 
22, Bureau of the Budget, 1942. “Factors in the variability of response in enumerative studies,” Journ. 
Am. Stat. Assoc., vol. 38, 1943, pp. 143-52. 

7 Size of sample is by no means the sole criterion for computing an error-band; the way the sample 
is taken is extremely important. But for a given sampling plan there can be various sizes of sample, 
some too small and some too big for the requirements. 
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not of the best possible design, too often running into sampling 
errors that are troublesome. 


ii. Being too sure, by taking too big a sample,’ and too often 
getting more precision than needed, thus wasting funds and slow- 
ing up the work, running into the errors and biases that so often 
beset large operations. 


It is possible to avoid either of these mistakes completely, but only by 
running headlong into the other. Thus, one could always avoid the 
first kind of mistake by making a practice of taking samples that are 
much bigger than necessary, but by so doing he would run headlong 
into the second mistake. Likewise, one could always avoid the second 
kind of mistake by making a practice of cutting corners too close and 
taking samples that are too small. Either mistake is bad if it occurs too 
often. Jt is impossible to avoid both mistakes altogether; even an expert 
sampling man will on rare occasions commit one mistake or the other, 
but his net losses over a long period of time will be at a minimum. He 
strikes these minimum losses by means of his error-formulas and by 
developing new and more efficient sample-designs. 

Steps in planning a sample. A skeleton outline of the steps ordinarily 
taken in planning a sample will possibly clarify some of the foregoing 
material and may assist teachers of statistics in pointing out some of 
the problems met in the application of theory. In the first place a prob- 
lem exists in administration, policy, design, or pure research, and it is 
proposed that data from a survey or experiment would provide new 
information that would be helpful in formulating a rational solution of 
the problem. 


i. The first step is to decide what sort of survey or experiment, if 
any, would be useful or feasible, and whether the funds that are 
available are likely to be sufficient. Part of the problem is to find 
out what lists (of firms, households, blocks), maps, instructions, 
field force and other facilities can be found for the job. A respect- 
able sample can be drawn only if there are lists or maps which 
encompass the entire universe to be studied, and by which a good 
complete count could be taken if there were need of it. The statis- 
tician is expected to possess special skills in collecting and inter- 
preting data and designing experiments, and to possess or acquire 
knowledge (e.g.) of business practices and other fields, thus to be 
able to give opinions concerning what data could be collected and 
interpreted usefully, and what the costs would be. 


ii. The second step is to lay plans for reducing all the errors and 
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biases* that may afflict the survey. It may be decided that the 
data can be obtained without too much error for the purposes 
intended. On the other hand, in view of the burden of response, 
or expected difficulties in getting the desired information from the 
respondents, or for lack of proper lists, it may be decided that the 
survey is not feasible and should be abandoned. 


iii. The third step takes place if it is decided that the survey is 
feasible. This step is to decide what sampling errors can be toler- 
ated and paid for (2, 5, 15 per cent).* This problem is usually solved 
by exchange of ideas between administrator and statistician. The 
administrator who needs the information to be supplied by the 
survey ordinarily thinks in terms of absolute and complete reli- 
ability. He has heard of sampling errors, but has seldom heard of 
any other kind of difficulty in collecting or interpreting data. He 
thinks of sampling as a game of chance, not appreciating the fact 
that sampling errors are under control. He is responsible for admin- 
istering a program and cannot take chances. It is therefore natural 
for him to demand too big a sample or even a complete count, thus 
committing the second kind of mistake mentioned above. The stat- 
istician, on the other hand, being an expert in foreseeing all kinds 
of errors and biases, knows that absolute accuracy is a myth, and 
arrives at a more practical view of the actual requirements of the 
precision to be sought. 

Sampling is both art and science, and in the negotiations that 
take place in this step art is often the more important of the two. 
Personality, integrity, dogged perseverance, knowledge of the 
subject-matter and sympathetic understanding of the other man’s 
point of view all lend confidence to sampling. Oftentimes what is 
needed is not so much confidence in sampling as confidence in the 
sampling man. 


* Some of these are variability in response, bias of the interviewer, bias of the auspices, imperfec- 
tions in the questionnaire, changes that take place in the universe before the data are used, bias from 
non-response and late response, an unrepresentative date for the survey, an unrepresentative selection 
of respondents, processing errors, and errors in interpretation. Practically all of these are more pro- 
nounced for complete counts than for samples. A more detailed list was attempted by the author in an 
article entitled “On errors in surveys,” in the American Sociological Review, vol. ix, 1944: pp. 359-369. 
It is a pleasure to express my indebtedness to Dr. Philip M. Hauser, Assistant Director of the Census, 
for several enlightening talks in regard to Step ii. 

® Thus, in the sample censuses of Congested Production Areas, taken in the spring of 1944, it was 
decided that a population count within three per cent for any area would serve the needs of the commit- 
tee on Congested Production Areas. This requirement called for a coefficient of variation of one per cent, 
and the plans were drawn up accordingly. Analysis of the returns showed that this precision was met, 
but not greatly overdone. As it turned out, the coefficients of variation ranged all the way from .64 to 
.92. Mistakes of the first kind were thus avoided in all the nine areas wherein samples were taken, and it 
would not have been comfortable to strike closer to the second kind. 
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iv. The fourth step is to design the sample to meet the allowable 
band of sampling error at the lowest possible cost. New theory 
may be needed, and theory may point to the advisability of pro- 
curing new facilities (new maps or lists) to achieve gains in pre- 
cision at lowered costs. The sample drawn will be scrutinized and 
tested in many ways to acquire knowledge regarding the expected 
sampling errors. One way of testing is to project the sample from 
one past survey to another, but this kind of test cannot be applied 
unless data from previous complete counts are on hand (such as 
the Census of Business, Agriculture, Population, etc.). See, how- 
ever, the penultimate paragraph regarding tests of samples. 


v. The fifth step is the operational one of putting the sampling 
plans into practice. Naturally the statistician in charge of sampling 
will take part in drawing up the questionnaire, a job to which his 
quantitative studies in the causes of errors and biases are espe- 
cially applicable. The instructions for the field and office pro- 
cedures are his responsibility so far as they affect the sampling 
procedures. Equally, the allowable detail in the tabulation plans 
is largely up to him. 


vi. The sixth step is the interpretation of the data, which may 
take the form of a professional article or a brief in nontechnical 
language for administrative use, showing the conclusions and 
recommendations. The statistician in charge of sampling has a 
heavy responsibility in the conclusions drawn from the data, 
because the permissible conclusions are dependent on the errors 
in the data, and this is his special province. 


SOME FURTHER CONSIDERATIONS BEARING ON TRAINING IN SAMPLING 


A distribution is not an intrinsic property of a universe. Sampling is 
partly art and partly science, as I have already said. In elementary 
courses in statistics everyone studies about distributions and samples 
drawn from them. The student is not usually reminded, however, that 
no distribution is an intrinsic property of a universe, but is the result 
of doing something to it (such as listing residence or business addresses), 
and that what is actually done depends on how the instructions are 
written. As an example, consider the distribution of the number of in- 
habitants per dwelling place in a city, as listed by a corps of listers. 
A dwelling place might be defined as an address appearing to the lister 
to contain not more than three dwelling units. At an address appearing 
to contain four or more dwelling units, each dwelling unit would then 
be listed as a separate dwelling place.’® It is a fact that the distribution 


1@ These were the intentions of the instructions in the sample censuses of Congested Production 
Areas taken in the spring of 1944. 
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obtained for (say) Charleston, and hence the precision of the population 
count obtained by enumerating the inhabitants of a sample of dwelling 
places (every k-th one on the list) is not an intrinsic property of Charles- 
ton but depends to a large extent on the way the definition of a dwell- 
ing place is expressed in words and how the instructions to the listers 
and supervisors are written. Seemingly insignificant factors will be im- 
portant, such as the amount of space between lines on the listing form, 
the pay of the listers, whether they need the money or are doing the 
job in response to civic appeal, how many supervisors can be spared 
from Washington, the weather, whether the listing is all to be done by 
daylight, etc. Be it noted that such factors influence complete counts as 
well as samples. 

Thus, in conjecturing on the numerical values of some of the con- 
stants in his formula for the error-band in a proposed sample design, 
the sampling man is not dealing entirely with intrinsic properties of the 
universe, but is conjecturing partly on the behavior of the people who 
will carry out the plans. 

Remarks regarding complete counts. Training in sampling should teach 
respect for a complete count in its proper place. The statistician does 
not indulge in sampling promiscuously. His guiding philosophy is to 
minimize the two sources of error mentioned earlier, and he sometimes 
finds mathematically that the required sample is a complete count. He 
takes a complete count when it can be done cheaper and quicker than 
a smaller sample for providing the precision that is required. 

A complete count is needed once in a while for efficient sampling be- 
tween complete counts. As a matter of fact, the sampling expert is one 
of the most ardent advocates of a complete count at intervals not too 
far apart, such as the decennial census, not only for its myriad of local 
detail for marketing and sociological uses but also for its block statistics 
and figures for various small segments of the population, all of which 
are useful in designing efficient samples of population, agriculture, and 
business. 

Another thing. When a sampling expert speaks of a complete count 
he means a complete count. He knows the dangers that lurk in nonre- 
sponse, and he knows that the main source of error in the results pub- 
lished from a sample often lie in the incompleteness of a previous com- 
plete count which is used as the base for expansion. He therefore wants 
not only great detail but energetic follow-up in a complete count so 
that it is as complete in proportion as his samples must be. 

Remarks regarding tests of samples. The ultimate aim in the theory 
and practice of sampling should be efficiency in the sense of the great- 
est amount of useful information per dollar, not per case (person, house- 
hold, firm). This point is often obscured in the classroom and unfortu- 
nately also in some present-day business-practice. 
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Statistical information should be regarded as a product. it is an in- 
tangible product, to be sure, but a product it is nevertheless. With a 
tangible material product it is sometimes possible to perform tests to 
determine whether the article meets certain requirements that will in- 
sure its utility. An example is furnished by the tensile strength of wire: 
if the wire holds up under certain stress, it will hold up under that 
stress in the future, provided it does not corrode, rust, crystallize, or 
change character otherwise under conditions of service. On the other 
hand, there are tangible products for which no one has yet learned to 
specify tests that will insure the characteristics that are desired in 
service. An example is the wearing qualities of leather, although the 
picture is rapidly changing with improved tests and molecular theories. 

It is likewise in the collection of statistics. It is difficult, nay im- 
possible, in our present state of ignorance regarding the mechanisms by 
which biases are produced, to decide from the results of a single survey 
or even from a series of surveys of the past that any method of partial 
investigation is good in the sense of its product being dependable in the 
future. One must be concerned chiefly with the procedures by which 
the statistics were collected and processed. It is necessary to look to 
the sample design, the devices by which the universe was listed, the 
effect of nonresponse or failure to cover certain segments of the uni- 
verse, the questionnaire, the training of the field force, the instructions, 
the auspices, the office processing, the coding errors, and all else in the 
machinery that finally puts the figures into the tables.? If all these 
aspects of a survey seem to have been adequately thought through, 
and if a few trial results come up to expectations, a great deal of con- 
fidence can be placed in the method. Any departure from accepted pro- 
cedures of sampling puts a severe burden of proof on the proponents of 
such departures, and this will continue to be so until biases have been 
studied with the same thoroughness and success as sampling errors have 
been studied and controlled. 

The teaching of statistics. As Hotelling has emphasized, improvement 
in the teaching of statistics is contingent upon better training for 
teachers of statistics, which can hardly be looked for on an important 
scale until statistics ceases to be a side-line in universities and acquires 
the same recognition accorded to mathematics, physics, economics, 
etc. His paper “On the teaching of statistics,” in the Annals of Mathe- 
matical Statistics for Dec. 1940 (pp. 1-14) should be read by anyone 
interested in the improvement of the profession of statistics. This is 
now followed by a sequel in the Biometrics Bulletin for April 1945 
(pp. 22, 23). 




















ESTIMATE OF SERIES E BOND PURCHASES BY 
FARMERS 


Atvin 8. TosTLeBe 
Bureau of Agricultural Economics 


Y GREATLY increasing cash farm income, the war has intensified 
B the need for new series that will reflect changes in special aspects of 
the financial condition of farmers. From the beginning of the war more 
than ordinary interest has followed changes in the value of farmers’ 
physical assets like real estate and livestock, and changes in farm debt. 
It is quite generally recognized that even more pronounced changes 
may be occurring in farmers’ holdings of certain financial assets. During 
this period of unusually high cash farm income, restricted markets, and 
enlarged opportunities and obligations to invest in securities of the 
Federal Government, it is likely that the pattern of the disposition of 
farm income has been altered considerably and is contributing to the 
rapid growth of financial assets, like currency, bank deposits, and 
Government bonds. 

These financial assets have special significance at this time as they 
are almost completely liquid and would suffer no depreciation if a re- 
cession of prices from wartime levels were to reduce the values of the 
physical assets. They will, therefore, have an important influence on 
the ability of farmers to adjust their operations to postwar conditions 
and they will constitute the farmers’ best hedge against a possible de- 
flation. 

To obtain a somewhat more complete and useful picture of the de- 
cided changes which were being wrought by the war in the financial 
condition of farmers, the Bureau of Agricultural Economics last year 
undertook to estimate the volume of currency, bank deposits, and 
Government bonds held by farmers on January 1, 1940 and annually 
thereafter. This paper is concerned with the Bureau’s estimate of the 
volume of United States savings bonds (Series E) that have been 
bought by farmers in the United States (Table 1). It is believed that at 
least 92 per cent of all United States savings bonds purchased by 
farmers are of this series. Elsewhere these amounts have been presented 
as the basis of estimates of the value of United States savings bonds 
held by farmers.! The purposes of this paper are (1) to describe the 


1 For estimates of the value of United States savings bonds held by farmers on selected dates, see 
the following publications of the U. 8. Bur. of Agr. Econ.: Wartime Changes in the Financial Structure of 
Agriculture, Misc. Pub. No. 558, 1945; The Impact of the War on the Financial Structure of Agriculture 
(in press); The Balance Sheet of Agriculture, 1945; all by A. 8. Tostlebe, D. C. Horton, R. J. Burroughs, 
and others. 
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TABLE 1 


COST OF FARMER BOND PURCHASES: U. 8. 


Savings bonds Series E: Estimated by regions and as percentage of total purchased 
July 1, 1941 to December 31, 1944 




















1941 
Region (July-Dec.) 1942 1943 1944 Total 
1,000 dol. 1,000 dol. 1,000 dol. 1,000 dol. 1,000 dol. 
North Atlantic* 11,490 73 ,420 118,438 149 ,993 353 ,341 
South Atlantict 11,574 78 ,072 134 ,968 171,858 396 ,272 
South Centralf 11,516 82,978 161 ,654 196 ,108 452 ,256 
Lake States § 9,503 70,412 140 ,601 168 , 886 389 ,402 
Corn Belt 16,148 137 ,994 257 ,484 317 ,608 729 ,234 
Texas-Oklahoma 7,790 49,190 101 ,939 132,146 291 ,065 
Great Plains** 5,413 43 ,187 109 , 528 150 ,602 308 ,730 
Mountainft 3,582 22,455 51,632 69 ,602 147 ,271 
Pacifictt 7,867 52,132 88 ,746 115,306 264 ,051 
UNITED STATES 84,883 609 ,840 1,164,990 1,471,909 3,331,622 
Farmer purchases as per- 
centage of total 8.4 10.5 11.6 11.8 11.4 





* Maine, N. H., Vt., Mass., R. I., Conn., N. Y., N. J., and Pa. 
t Del., Md., Va., W. Va., N. C., S. C., Ga., and Fla. 

t Ky., Tenn., Ala., Miss., Ark., and La. 

§ Mich., Wis., and Minn. 

q Ohio, Ind., Ill., Mo., and Iowa. 

** N. Dak., S. Dak., Kans., and Nebr. 
tt Mont., Idaho, Wyo., Colo., N. Mex., Ariz., Utah, and Nev. 
tt Wash., Oreg., and Calif. 


method used in estimating the volume of bonds purchased, (2) to 
examine the basic assumption upon whose general validity the trust- 
worthiness of the estimates depends, and (3) to compare results ob- 
tained by this method with those from the very few surveys and reports 
covering limited areas which are now available. 


METHOD OF ESTIMATE 


No attempt has been made by the Department of the Treasury to 
maintain records of bond sales by type of purchaser but its records show 
the sales of Series E bonds in each county from July 1, 1941 to date. 
The problem was to devise a method by which the farmers’ proportion 
of these sales might be estimated. 

As a first step, a selection of counties was made for use as a sample. 
Requirements for inclusion were (1) farmers constituted more than 50 
per cent of the population, (2) there was no city with population so 
large as 15,000, and (3) population growth had not exceeded 2 per cent 
between 1940 and 1943. This threefold screen was designed to leave 
sample counties for which the assumption could reasonably he made 
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that per capita purchases of Series E bonds by farmers, on the average, 
equal those of nonfarmers. Granting this assumption, the bond sales 
in sample counties as reported by the Treasury could be used as a basis 
for estimating the bond purchases of farmers. The screen provided 
1,397 sample counties to which were added 33 counties that approxi- 
mated the requirements of the sample (Table 2). The reasons for these 
additions are indicated below. 

The method consisted of applying the per capita sales of Series E 
bonds in the combined sample counties of each crop-reporting district 
to the farm population of those districts. The estimate was made by 
crop-reporting districts because of the wide variation in types of farm- 
ing and in circumstances of farmers in different parts of many States. 
By making the crop-reporting district the unit for estimating sales to 
farmers, the per capita sales in the sample counties were weighted by 
the farm population living in an area in which farming conditions were 
presumably similar. That conditions are approximately similar in most 
crop-reporting districts is due to the fact that, in general, the bound- 
aries have been drawn with an eye to making these districts as homo- 
geneous in type of farming as possible. This is facilitated by the limited 
area included in each district. Typically, the States are divided into 
nine districts, although a few large States contain more, and in small 
States as few as one is found. 

Having made the estimates for crop-reporting districts, summariza- 
tion by States and for the United States was a simple process. 


SELECTION OF SAMPLE COUNTIES 


Sample counties were limited to those with farm populations of 50 
per cent or more in order to make sure that influences which affect bond 
purchases of individuals—income, attitudes toward spending and sav- 
ing, exposure to bond-selling campaigns, etc.—would be as similar as 
possible for farmers and nonfarmers. A higher percentage, possibly 60 
or 70, might have been preferable for then the influence of purchases 
actually made by farmers would have been greater. Moreover, if quotas 
are to be reached in counties with very high proportions of farmers, 
bond-selling campaigns generally must be conducted among farmers as 
intensively as among other more accessible groups. These considera- 
tions, however, are at least partially neutralized by the fact that farm- 
ers’ income and their contact with bond-selling campaigns in such 
counties are not necessarily typical of counties less rural in nature. The 
case is not clearly in favor of using only counties with very high per- 
centages of farm population but the chief reason for including counties 
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with farm population as low as 50 per cent was that the higher figures 
would have left most States without any sample counties or with too 
few to be adequate. As it was, several large States with important agri- 
cultural areas did not have counties that met the 50 per cent test and 
special methods had to be devised to estimate bond purchases of farmers 
in these. On the other hand, to have lowered the measure below 50 
per cent would have opened the door to increasing degrees of urban 
influence on bond buying. Fifty per cent, therefore, represented a prac- 
tical compromise which would permit the use of the method widely 
without spoiling the results by bringing counties into the sample in 
which the bond sales were influenced predominantly by urban condi- 
tions not approximated also on the farms. 

The same consideration suggested the wisdom of barring counties in 
which more than 50 per cent of the people were farmers but in which a 
larger city was located. Such situations are not common, but nine 
counties in five Southern States were rejected for this reason. 

Counties in which the population had increased between 1940 and 
1943 by more than 2 per cent were barred from the sample because, in 
view of the almost universal reduction in farm populations, such growth 
is a strong indication of the probable presence and expansion of some 
urban war-industry or a developing military area which attracts civil- 
ians who are not farmers. Forty-four counties, otherwise acceptable, 
were barred for this reason. 

In view of substantial evidence that during this period farm popula- 
tion had decreased markedly everywhere outside of a few New England 
States it is possible that this screen should have been set to eliminate all 
counties in which any growth of population had occurred, or even to 
eliminate any that did not show a moderate decline. The sample con- 
tains 19 counties in 11 States which experienced population growth— 
but less than 2 per cent. This is 1.3 per cent of the total number of 
sample counties for the entire country. If the growth in population in 
rural counties was due to an increase in nonfarm war workers who re- 
ceived high wages and participated in pay-roll deduction plans for 
buying bonds, the estimate received a slight upward bias by their in- 
clusion. On the other hand, to the extent that the growth was due to an 
influx of persons attracted by a military camp, some downward bias 
may actually have resulted. In any case, the error in the estimate for 
the entire country from this sourse is probably rather negligible. 


BASIC ASSUMPTION EXAMINED 
What can be said in support of the assumption that the per capita 
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bond purchases of farmers will equal those of nonfarmers in counties 
that meet the threefold test? First, in such counties the general prosper- 
ity of the nonfarm population usually is closely related to the prosperity 
of the farmers. The townspeople are largely engaged in servicing farm- 
ers by marketing their goods and by selling them most of the things 
which they buy. There is thus a strong tendency for the prosperity of 
both groups to rise and to fall together. 

Then, because of the close relationships between the groups—busi- 
ness, social, and cultural—the differences in economic opportunities 
tend to be minimized. Frequent intermingling of farmers and others, 
and many personal contacts, make for recognition and understanding of 
the opportunities in each field. Hence, when farm population predomi- 
nates and urban communities are small it seems likely that differences 
in wealth and in income between farmers and nonfarmers will be no 
more pronounced than among the members of each individual group. 
In counties like these there may be wealthy and poor farmers and like- 
wise wealthy and poor residents of villages and small cities. It is not 
likely that the farmers as a group will be decidedly better or worse off 
than the nonfarmers. 

Finally, it is likely that in counties which meet the threefold test 
the bond-selling efforts will be directed to farmers at least as much as 
to the more accessible nonfarm group. In counties that contain only 
small urban communities the nonfarm groups are not likely to be sub- 
jected to bond-selling campaigns that are so well organized and imple- 
mented as in the larger urban centers. Moreover, a larger proportion 
of a rural county’s quota must be raised from farmers. Hence where 
industrial or business population does not predominate, sales pressure 
is apt to be more nearly the same for all groups. Even so, it is probable 
that in the earlier bond-selling campaigns a greater proportion of the 
selling effort was directed to the urban groups than was true later. For 
this reason it may well be that the estimates for 1940-42 are relatively 
high compared with 1943 and 1944. 

The effectiveness of bond-selling drives among farmers assumes par- 
ticular importance when it is recalled that even when the incomes of 
farmers and nonfarmers tend to be similar the disposition of these in- 
comes may be different because of particular circumstances characteris- 
tic of each group. For example, the nature of farming is such that, on 
the average, farmers are likely to carry a heavier debt than is customary 
among the nonfarm group. Then, more generally than others, farmers 
can invest surplus funds in their own business. Thus, although the 
amounts of income above immediate needs for business and living may 
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tend to be similar for farmers and nonfarmers in rural counties, nor- 
mally these surpluses are channeled differently. The savings of farmers 
tend to be invested in physical assets or used to retire debt, whereas 
those of nonfarmers more frequently seek investment in government 
and corporate securities. In wartime, however, this normal disposition 
of savings may be greatly modified by shortages in manufactured 
goods, by patriotic considerations, and by war-bond drives. 

In the calculation of per capita bond sales in sample counties, as 
well as in the application of these results to the farm populations of 
crop-reporting districts, population data from the 1940 enumeration 
of the Bureau of the Census were used. Later estimates of civilian popu- 
lation by counties are available, but not by farm and nonfarm groups. 
Because of unusually large-scale shifts in population throughout the 
country during the period covered by this estimate it is desirable to con- 
sider the extent of error that may have been introduced from this source. 

In general, the sample counties have been among the heaviest losers 
of population during the war years. War industries have not usually 
developed much in the villages and small cities of these counties, and 
population has been lost to more urbanized counties as well as to the 
armed services. Because of these losses, the division of total bond pur- 
chases in the sample counties for (say) 1943, by their population in 
1940, resulted in a per capita amount considerably lower than would 
have been obtained had the estimate of population for 1943 been used. 

But whatever error may have arisen from this source served to offset, 
at least in part, the error which may have arisen from the application of 
per capita purchases in sample counties to the 1940 farm population 
of the respective crop-reporting districts—a procedure made necessary 
by the fact that no estimates of farm population by counties for later 
years are available. If the percentage decline in farm population of a 
crop-reporting district equaled exactly the percentage decline in total 
population of its sample counties, the errors introduced by the use of 
1940 population data would cancel completely. If the percentage decline 
in farm population throughout the districts was larger than that of 
total population in the sample counties (which is likely), the estimates 
for the later years have been given some upward bias through the con- 
tinued use of the 1940 population data. But the bias is less than it 
would have been had the total bond sales in the sample counties during 
the later years been divided by the later estimates of their total popula- 
tions. As indicated above, there is more or less cancellation of error if 
1940 population data which are necessarily used in the one operation 
are used also in the other. 














AMERICAN STATISTICAL ASSOCIATION 


METHOD OF ESTIMATE IN NORTH ATLANTIC, 
MOUNTAIN, AND PACIFIC AREAS 


For most States outside the North Atlantic, Mountain, and Pacific 
areas, it was possible to adhere to the rule that not less than two coun- 
ties should constitute the sample for each crop-reporting district. When 
at least two counties that would qualify as samples were not found in a 
district, two or more crop-reporting districts were combined if such 
combination provided at least two sample counties for the enlarged 
district, or the boundaries of the crop-reporting districts were altered 
somewhat to remove a sample county from a district that had an excess 
of sample counties to an adjoining district that had a deficiency. In a 
few instances the requirements of the sample were bent slightly to 
permit a county to qualify. In rare cases the bond sales in a contiguous 
sample county of an adjoining State were used as a partial basis for 
estimating the bond purchases in a section of a State especially de- 
ficient in sample counties. This procedure was avoided, wherever pos- 
sible, because of the error that might be introduced by possible differ- 
ences among the States in the energy with which the war-bond cam- 
paigns have been conducted among farmers. 

Delaware, Maryland, and the North Atlantic States (Vermont ex- 
cepted) constitute an area so highly industrialized, or with such large 
areas unsuitable for farming, that practically no counties could be 
found that conformed to the standard set for the sample. These States 
had to be treated separately and in such a way as seemed best under the 
individual circumstances of each State. Some of these States are small 
and contain relatively unimportant segments of agriculture. Inability 
to estimate closely the farmer purchases of bonds in these States would 
not seriously affect the estimate for the entire country. Others, notably 
New York and Pennsylvania, are large and although extensively indus- 
trialized have retained an important agricultural position. Some of the 
Mountain and Pacific States presented similar problems. 

Several possible procedures for these special cases were considered. 
The possibility of modifying the sample screen in these industrialized 
States, by using as sample counties those with farm populations of less 
than 50 per cent, suggested itself. There is, of course, no magic line at 
50 per cent which renders counties having a farm population slightly 
higher than that, acceptable samples, and counties with a smaller pro- 
portion, unacceptable. But when the farm population of a county falls 
below 50 per cent there is reason to suppose that industrial and urban 
influences dominate and become increasingly influential as the propor- 
tion of this nonfarm element becomes greater. Inevitably urban in- 
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comes, attitudes, and bond-selling plans exercise the dominant influ- 
ence, and per capita sales in such counties will be strongly influenced 
by them. Nevertheless in three States—California, New York, and 
Pennsylvania—it was decided to modify the general rule and to use 
sample counties with less than 50 per cent farm population. 

To prevent excessive errors introduced by urban influence the sample 
was severely limited to a few counties whose farm population was 
nearest to the 50 per cent minimum. Thus only five sample counties 
were selected in New York, five in California, and six in Pennsylvania. 
Because of this very limited sample it seemed desirable to make allow- 
ance for possible differences in the economic condition of farmers in 
the sample counties and in the rest of the State. The per capita bond 
purchases were therefore adjusted according to the relation of the 
average value of land and buildings per farm in the sample counties and 
in the respective areas of the State to which the bond-selling experience 
of the sample was applied. 

In the North Atlantic group, Vermont alone has counties that meet 
the requirements of the sample. In the remaining States of that group 
industrialization has gone so far, or other developments curtailing agri- 
culture are so prevalent, that it appeared unwise to attempt an estimate 
even on the basis of their least urbanized counties. Instead, the sales 
experience of Addison and Orange Counties, Vermont,’ was applied, 
with appropriate adjustment to the differences in the average value of 
land and buildings in farms, to every New England State except Maine. 
Even in Maine, the experience of Addison and Orange Counties was 
applied to two districts where the type of farming was similar to that 
of the Vermont counties, while to the remaining district the bond- 
selling experience of a similar area in Michigan was applied. In New 
Jersey, the bond-sales experience of sample counties in Maryland, 
Michigan, and Wisconsin was used after being adjusted to differences 
in value of land and buildings in farms. In a very few other instances 
and in a limited way sample counties from without a State were applied 
to contiguous territory. 

Probably the greatest risk in applying samples from areas outside 
the State is found in the fact that the various States have not carried 
their bond-selling campaigns to farmers with equal vigor. In some 
States it may have appeared wiser to the State and county war finance 
committees to press the campaign with particular energy in the indus- 
trial centers with a consequent neglect of farmers. In other States 

? Grand Isle County met the specifications of the sample and was used in making the Vermont 


estimate. It was not used, however, in estimating bond purchases in other New England States because 
it seemed less typical of that region. 
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greater emphasis may have been laid on selling bonds to farmers. Such 
differences tended to be especially pronounced in the earlier campaigns. 
This factor cannot be measured, and allowance cannot so readily be 
made for it as, for instance, in cases in which differences in wealth and 
income exist. The risk of error from this source and from other non- 
economic influences which vary considerably from State to State 
should be taken only when urbanization and other factors make it im- 
possible to find sample counties which would more closely reflect 
farmers’ purchases in the State. 

In the Mountain States are numerous crop-reporting districts that 
did not have counties which meet the requirements of the sample. 
Therefore modifications were made in the details of the method which 
in a particular situation involved as little distortion as possible. The 
more common modifications may be illustrated by the procedure used 
in the case of Utah. This State has 29 counties, many containing very 
dissimilar types of farming. They are divided into four crop-reporting 
districts which, surprisingly, are numbered 1, 5, 6, and 7. The four 
counties in the State which meet the qualifications for sample counties 
are located on or near the eastern border in district 6. 

District 1, comprising eight counties in the northwestern part of 
the State, is by far the most populous, and contains nearly one-half of 
the farm poplation of the entire State. Types of farming vary greatly 
here. It was decided to use Box Elder and Morgan Counties as samples 
although the propertion of farm to total population was only 43.4 per 
cent in the former and 47.5 per cent in the latter. The types of farming 
in these counties were typical of the entire crop-reporting district. 

The seventh district, which extends two-thirds of the way across the 
southern part of the State, is very sparsely populated. This district 
was taken care of by extending its boundary to include a sample coun- 
ty, San Juan, from the sixth district. The types of farming in most of 
the seventh district and in San Juan County are the same. This gave to 
the expanded sixth district only one sample county. However, this 
county contained more than one-quarter of the farm population of the 
expanded district. 

The fifth district was combined with the remainder of the sixth, 
which included three sample counties containing nearly one-third of 
the farm population of the combined districts. In this combination the 
sample counties do not embrace types of farming as varied as in the 
entire area for which they became the sample. 

Such were the more important modifications that were made through- 
out the Mountain and Pacific States generally. Doubtless the estimate 
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for any individual State among these is likely to be wide of the mark 
and should never be used alone. The estimates for these entire regions 
probably contain a much wider margin of error than the estimates for 
those regions where sample counties are plentiful. The inclusion of these 
results with those from States in which little or no modification of the 
method was necessary is justified only because (1) there is likely to be 
some cancellation of error where State figures are combined and (2) 
the farm population of these States is a small fraction of the farm popu- 
lation of the United States. Thus very large errors in estimating the 
bond purchases of farmers in such individual States would have a lesser 
effect on the estimate for a region and a relatively small effect on the 
national estimate. 

Elsewhere as in Ohio, Indiana, Illinois, Michigan, Maryland, and 
Wisconsin, which in certain areas are highly industrialized, it was neces- 
sary to apply some of these modifications, but on a relatively small 
scale, in limited areas of the States. 


COMPARISON OF ESTIMATE WITH INDEPENDENT SURVEYS AND REPORTS 


So far, surveys and records of farmers’ purchases or ownership of 
United States savings bonds have been too limited or have themselves 
contained too large an element of estimate to provide a satisfactory 
check on the estimates obtained by the methods described here. But 
such as have been made, provide in a limited way some empirical 
grounds for confidence in the basic assumption of this method and of 
its results. 

Four surveys of the ownership of United States savings bonds among 
individuals in selected counties have been made for the Department of 
the Treasury. A few of these counties are also in the sample group of 
the project here described. Data obtained in those surveys made pos- 
sible the calculation of the farmers’ proportion of individually owned 
United States savings bonds in each county, as indicated by the inter- 
views taken. When these proportions were averaged for the sample 
counties in each survey, and were compared with the average percent- 
age of farm population in the same counties, the similarity was striking 
(Table 3). 

This comparison provides an empirical test of the validity of the 
basic assumption that, on the average, in sample counties farmers tend 
to buy Series E bonds in proportion to their numerical strength in the 
population. The test is far from conclusive, for the number of counties 
involved was too small and the interviews per county were too few to 
assure great accuracy. Moreover, the surveys dealt with ownership of 
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bonds, and with Series F and G as well as E—factors which perhaps 
do not detract greatly from the comparison where rural areas are in- 
volved. 
TABLE 3 
FARM POPULATION AND FARMER-OWNED SAVINGS BONDS: U.S. 


Percentage average for selected counties on specified dates, 1943 and 1944 




















Percentage farm population Percentage of individually 
Date is of total population owned United States savings 
(1940 Census) bonds held by farmers§ 
Per cent Per cent 

October 1943* 65 66 
February 1944f 65 63 
July 1944f 61 67 
December 1944f 61 55 
Average 63 63 








* Average of percentages of 8 counties in Iowa, IIl., La., Mich. (2), Nebr. (2), and W. Va. 
t+ Average of percentages of 7 counties in S. C., Ill., Nebr., Ark., La., Mich., and Iowa. 

t Average of percentages of 7 counties in Kans., S. C. (2), Ala., N. C., Minn., and Iowa. 

§ Averages of results of scattered surveys for the Department of the Treasury. 


Yet the fact that in each of the four surveys the average proportion 
of bonds owned by farmers in the sample counties was close to the 
average percentage of farmers in the population (despite the almost 
complete change in sample counties included in the several surveys, ex- 
cept the fourth) appears to be significant. It encourages the hope that 
additional and more comprehensive surveys will confirm the essentia! 
reliability of the method. 

A few State War Finance Committees have attempted, during vari- 
ous periods, to record sales to farmers separately. No committee claims 
to have obtained complete records, and in each instance the reported 
sales to farmers contain some element of estimate. Nevertheless, the 
recorded sales in certain counties of Alabama, and in the State of Wis- 
consin are believed to be sufficiently comprehensive to warrant com- 
parison of their reported sales with estimates of the Bureau of Agri- 
cultural Economics for comparable periods and areas.® 

The Alabama War Finance Committee has reported bond sales to 

* Bond sales to farmers are reported by the State War Finance Committees by war loan drives. 
Comparable estimates by BAE are based on total bond sales in sample counties not only during bond 
drives, but also during the interdrive periods and during two months preceding the first drive reported. 
This extension of BAE’s estimate to include the interdrive periods plus two months preceding the first 
bond drive for which a report was made seems necessary because of well recognized differences in the 
bond-buying practices of farmers and nonfarmers. The former typically limit their purchases largely 
to bond drives, whereas the latter tend to spread their purchases more evenly. Thus an estimate by 


BAE for an individual month in which a bond drive occurred would probably understate farmers’ pur- 
chases of that month. 
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farmers for 21, 31, and 30 of the State’s 67 counties for the third, fourth, 
and fifth war-loan drives respectively. Among these, 12 are included 
in each report and in the Bureau’s group of sample counties. The com- 
bined sales reported for these 12 counties in the third, fourth, and fifth 
drives when reduced 5 per cent (as suggested by the committee) to 
eliminate all but Series E bonds, totaled 5.9 million dollars which is 
20.3 per cent less than the Bureau’s estimate of 7.4 million dollars for 
the comparable period. This difference does not appear too serious, 
partly because it pertains to results obtained from the sales experience 
of only one-fourth of the sample counties, but more especially because 
the Alabama War Finance Committee does not claim that all bond 
sales to farmers have been included in its tabulations. It believes “the 
amount reported is a minimum rather than a maximum of bonds pur- 
chased by farm people.” 

According to the records (partly estimated) of the State War Finance 
Committee of Wisconsin, farmers in that State paid 135 million dollars 
for bonds bought during the first six war loan drives. Reduced 4 per 
cent‘ to eliminate other types, there remain 129.6 million dollars as the 
sum paid by farmers for Series E bonds. The Bureau’s estimate for the 
comparable period is 114.6 million dollars; this is 11.6 per cent less 
than the amount reported by the State War Finance Committee. 

The foregoing comparisons are too limited in number and scope to 
be considered more than rather scant evidence that the Bureau’s 
method of estimating bond purchases by farmers yields reasonably re- 
liable results when applied to areas larger than individual States. 
Further testing of the assumptions and results are desirable, but this 
must await the appearance of additional trustworthy surveys and rec- 
ords with which reliable comparisons can be made. 


« Wisconsin experience in the sixth war loan suggests this percentage. 











COMPONENT INDEXES AS A BASIS FOR STRATIFICATION 
IN SAMPLING* 


By MARGARET JARMAN HaGoop AND ELEANOR H. BERNERT 
Bureau of Agricultural Economics 


HE PURPOSE of stratification in sampling is to utilize available in- 

formation on the units to be sampled in such a way as to assure 
better representation with respect to certain characteristics of the units 
than would be expected from simple random sampling. The process of 
stratification involves first the choice of “control” characteristics and 
second some method of utilizing information on the control character- 
istics which will group the units to be sampled into strata, each contain- 
ing units relatively homogeneous with respect to the control character- 
istics. Both nonquantitative classifications, such as geographic location, 
and quantitative variables may be used as control characteristics for 
stratification. 

The criterion for choice of control characteristics is their relation to 
the item on which observations are to be made in the sample for the 
purpose of making estimates for the universe. The more closely a single 
control characteristic is correlated with the item to be estimated from 
the sample, the greater will be the improvement in efficiency of estima- 
tion from a sample stratified to assure representativeness with respect 
to that control characteristic over a sample drawn without any 
stratification. 

Several considerations complicate the problem of choice of control 
characteristics. After one control has been chosen, the simple criterion 
of highest correlation with the item to be estimated has to be modified 
in choice of the second and subsequent controls, since the interrelation- 
ships of the control characteristics have to be taken into account. In 
the straight-forward case where a single item is to be estimated from the 
sample, and its correlations with available control characteristics are 
known, along with their intercorrelations, the choice of successive con- 
trol characteristics is similar to the choice of successive estimating vari- 
ables in a multiple regression problem. 

In the more complex and far more common case in social and eco- 
nomic surveys, estimates are to be derived from the sample not for 
one, but for many items. Moreover, often no precise information is 
available as to the degree of correlation of the various items to be esti- 
mated from the sample with the characteristics on which data are avail- 


* A paper presented at the 104th Annual Meeting of the American Statistical Association, Wash- 
ington, D. C., December 28, 1944. 
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able for possible use as controls. In such a case, judgments of persons 
familiar with the subject matter may have to be relied on for determin- 
ing which of a group of possible controls are likely to be most closely 
correlated with the items to be estimated from the sample. 

When control characteristics have been selected, the next problem 
is to develop a method of utilizing information on the selected controls 
which will group the units into strata containing units as similar as 
possible with respect to the control characteristics. Indexes derived 
from component analysis of the intercorrelations of control character- 
istics which are quantitative variables are here proposed as a technique 
for stratification, and an example of their use is illustrated. 

For certain field work of the Bureau of Agricultural Economics, a 
relatively small sample of counties was desired for use in continuing 
and periodic social and economic studies relating to the attitudes, folk- 
ways, institutions, population movements, and other behavior patterns 
of farm people. Because certain types of studies involve communities 
and other units larger than a single farm or farm family, and because 
administrative considerations limited the number of different locations 
that could be studied, the county was chosen as the primary sampling 
unit and the number of counties to be included in the sample was set 
at approximately 70. 

Major type-of-farming regions of the United States were chosen as 
the primary basis of stratification. Geographic differences condition the 
type of agriculture prevailing in the different regions of the United 
States and thus affect importantly the social and economic character- 
istics of farm people. In addition, it was desired that the sampling de- 
sign provide a basis for summaries for major type-of-farming regions 
as well as for the United States. The major type-of-farming regions 
were adapted from a revision of the well-known BAE type-of-farming 
delineation developed in 1937. All counties of the United States were 
thus first grouped into 7 major type-of-farming regions and a residual 
group for which no regional summary would be attempted but which 
would be necessary to supplement the 7 major type-of-farming region 
samples in making national summaries. 

Although geographic and type-of-farming criteria were considered 
the most important single basis for stratification and were used as the 
primary basis, geographic stratification per se was not used further in 
determining strata within major type-of-farming regions. This is a de- 
parture from the more customary practices of sampling in rural social 
surveys. Generally, within a major geographic region or state finer 
strata have been developed by the use of minor type-of-farming sub- 
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regions consisting of contiguous counties. Such a sample design, how- 
ever, relies wholly on geographic control as indicated by the subregional 
classification as the only basis for stratification. Experimental work 
has not yet been done to afford an answer as to the superiority in 
stratification of purely geographic control or of geographic control 
combined with other types of information such as used in the sampling 
plan presented here. 

The hypothesis underlying the development of the plan presented 
is that the social.and economic characteristics of farm people are im- 
portantly influenced by factors other than geographic; hence informa- 
tion relating to these other factors should also be used in stratification 
in order to assure satisfactory representation of the variations in these 
factors. However, purely geographic or type-of-farming area control 
may be preferable for studies of phenomena, such as crop yields, which 
within a given region or state are more completely determined by 
physiographic factors (such as soil type, rainfall, topography, etc.). 

From the range of characteristics available for use as controls in 
delineating strata within each major type-of-farming region, some 12 to 
14 variables were chosen from the 1940 Censuses of Agriculture and 
Population and related sources. The considerations governing the 
choice of the variables will not be detailed here, but, in general, the 
most important criterion was relevance to, or estimated correlation 
with, the broad classes of phenomena to be observed in the sample 
counties. 

A set of control variables was selected for each major type-of- 
farming region separately, with certain variables used in every region. 
Generally there was more uniformity in the group of variables relating 
to the farm population for the several regions than among the variables 
relating to agricultural characteristics. For example, the proportion of 
the total population of the county which is rural-farm, a rural-farm 
level of living index, and information on migration for the 1930-40 
decade and on total population change for the period 1940-43 were used 
as control variables in every region. Among the agricultural character- 
istics mean age of farm operators, extent of off-farm work of farm op- 
erators, and some variable relating to hired farm labor were used in 
almost every region, while the variables relating to type of production, 
size of farm, mechanization, etc. were selected according to their impor- 
tance in any given region. As examples, in the Corn Belt the proportion 
of farm land in corn was an important agricultural control, in the Cot- 
ton Belt the percentage of farms operated by sharecroppers, in the 
Dairy Region the number of cows milked per farm reporting, in the 
General and Self-Sufficing Region the proportion home consumption 
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comprised of the total value of agricultural production, in the Range 
Livestock Region the number of cattle kept principally for beef, in 
the Wheat Belt the value of implements and machinery per farm, and 
in the Western Specialty Crop Areas the proportion of the county’s 
total agricultural production produced on farms with a value of prod- 
ucts of $10,000 or more. 

For illustration of the use of component indexes in stratification of 
counties for sampling, data and procedures are presented for the Gen- 
eral and Self-Sufficing Region. This region extends from New England 
down through the Appalachians into the northern edge of Georgia and 
westward to the Ozarks. The region includes 19 per cent of the Nation’s 
farm population. Because the major criterion in delineation of the 
General and Self-Sufficing Region was the absence of any dominant 
crops or livestock in agricultural production, the agricultural charac- 
teristics chosen for controls were more general than in the case of the 
other regions. The 12 population and agricultural variables used as 
controls in the region are listed in Table I. 


TABLE I 


INTERCORRELATIONS OF 12 POPULATION AND AGRICULTURAL VARIABLES FOR 552 
GENERAL AND SELF-SUFFICING COUNTIES 

















Identifi- 
cation Identification number of variable 
number 
of 1 2 3 4 5 6 7 S 9 10 11 12 
variable 
1 _ .179 .305 —.399 —.600 —.260 —.510 —.440 .357 —.468 .409 . 256 
2 -179 _ -300 —.115 —.228 -089 —.195 .152 .378 —.207 -227 .055 
3 -305 .300 — —.153 —.765 -033 —.699 .216 .773 —.441 .318 —.254 
4 —.399 —.115 —.153 _ 234 150 219 -260 —.195 310 —.452 —.230 
5 —.600 —.228 —.765 234 -- -067 -672 —.080 —.815 -695 —.421 . 083 
6 —.260 .089 -033 -150 .067 _— .089 416 —.121 .037 —.026 —.184 
7 —.510 —.195 —.699 219 -672 .089 -- .270 —.494 .353 —.445 .010 
8 —.440 152 216 -260 —.080 -416 .270 -- -386 —.135 —.154 —.350 
G 357 .378 -773 —.195 —.815 —.121 —.494 .386 — —.64 .455 —.284 
10 —.468 —.207 —.441 -310 -695 037 .353 —.135 —.644 — —.559 —.048 
ll .409 -227 318 —.452 —.421 —.026 —.445 —.154 455 —.559 — . 139 
12 256 -055 —.254 —.230 -083 —.184 010 —.350 —.284 —.048 .139 —_ 
Identification of variable 
1 Per cent of total population which is rural-farm 
2 Altitude measure for County Seat 
3 Replacement rate of rural-farm males, 25-69 years of age 
4 Per cent change in total civilian population, 1940-43 
5 Rural-farm level of living index 
6 Per cent change in rural-farm population through migration, 1930-40 
7 Mean age of farm operator 
8 Per cent farm operators reporting 100 days or more of work off farms comprise of those reporting work off farms 


i) 


Value of home consumption as per cent of value of all farm products 
Mean value of all farm products 

Family labor as per cent of all farm labor 

Value of livestock sold as per cent of value of all farm products 
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From the 552 ccunties in the General and Self-Sufficing Region only 
about 10 were to be included in the national sample. The problem of 
stratification then was one of grouping the 552 counties into 10 groups 
or strata, with each stratum containing counties as nearly alike as pos- 
sible with respect to the 12 characteristics which had been chosen as 
control variables. The one county drawn into the sample from a stra- 
tum would then be representing a group of counties which were all 
relatively similar with respect to these 12 important items. 

Since in the sample counties certain qualitative material is to be de- 
veloped which cannot be statistically weighted together, a further re- 
quirement in making the stratification was imposed, namely, that each 
of the 10 strata should contain approximately the same number of farm 


TABLE II 


CORRELATION OF 12 POPULATION AND AGRICULTURAL VARIABLES WITH THREE 
INDEPENDENT COMPONENTS, 552 GENERAL AND SELF-SUFFICING COUNTIES 




















Identification Component Proportion of 
number of variation explained 
variable I II III by three components 
(Per cent) 

1 — .669 — .499 .058 70.0 

2 — .388 .168 .458 38.9 

3 — .792 .357 — .247 81.6 

+ .430 447 — .322 48.8 

5 -906 —.127 -158 86.2 

6 .127 515 -471 50.3 

7 -753 -119 -32 68.8 

8 .006 -866 -262 81.9 

9 — .847 .388 — .064 7.2 

10 -759 — .042 — .244 63 .7 

ll — .660 — .228 .359 61.6 

12 .023 — .659 .391 58.8 





Proportion of varia- 

tion in 12 variables 37.9 19.1 9.6 66.6 
explained by compo- 

nent (per cent) 





Identification of variable 


Per cent of total population which is rural-farm 

Altitude measure for County Seat 

Replacement rate of rural-farm males, 25-69 years of age 

Per cent change in total civilian population, 1940-43 

Rural-farm level of living index 

Per cent change in rural-farm population through migration, 1930-40 

Mean age of farm operator 

Per cent farm operators reporting 100 days or more of work off farms comprise of those re- 
porting work off farms 

9 Value of home consumption as per cent of value of all farm products 

10 Mean value of all farm products 

11 Family labor as per cent of all farm labor 

Value of livestock sold as per cent of value of all farm products 
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people. Thus a regional summary of 10 cultural studies made in the 10 
sample counties would not be distorted by inequality in the impor- 
tance of the several studies, as would be the case if one county repre- 
sented two or three times as many people as another. 

To utilize information on all the selected control variables in strati- 
fication, mutually uncorrelated component indexes were employed. 
Each index is a linear function of the 12 control variables, with the 
weights for the variables being determined by component analysis of 
the matrix of intercorrelations of the variables. Component analysis of 
the matrix shown in Table I yielded the correlation coefficients between 
the components and each of the variables shown in Table II. These co- 
efficients of Table II formed the weights used in constructing indexes of 
the components. 

If the problem were to select a single variable for grouping counties 
into strata, the first component is the variable which would afford the 
“best” basis for grouping together counties so as to provide maximum 
homogeneity within groups with respect to the 12 control variables." 
However, we do not have direct information on the counties’ values 
with respect to the first component, which could be used in grouping 
the counties into strata. But the component index as described above 
affords a basis for estimating for each county its value with respect to 
the first component. 

The index equation for estimating a county’s value for the first com- 
ponent (I) from its value on the 12 variables listed in Table I is as 
follows: 


I= -— .669z, = 38822 —_ .7922; + 43024 + .906z; + .1272Z 
+ .15327 + .0062 - 8472, + -7592Z10 —_ .66021; + .02322, 


where the z’s represent standard scores of the control variables. This 
index was computed for each of the 552 counties in the region. On the 
basis of the resulting index values, the counties were grouped into five 
classes, each containing approximately the same number of farm popu- 
lation. Examination of the index formula indicates that counties with 
a high value on Index I tend to have a high value on the rural-farm 
level of living index (r=.906), a high total value of products per farm 
(r=.759), a high average age of farm operators (r =.753), and low val- 
ues on the proportion home consumption is of all farm products (r 


18.8. Wilks, “Weighting Systems for Linear Functions of Correlated Variables When There Is No 
Independent Variable,” Psychometrika 3 (March 1938), pp. 24-43; Harold Hotelling, “Analysis of a 
Complex of Statistical Variables into Principal Components,” Journal of Educational Psychology 24, 
pp. 417-441, 498-520. 
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= —.847), on the replacement rates of males 25-69 years of age 


(r= —.792), on the proportion of population living on farms 
(r = —.669), and on the proportion family workers comprise of all farm 
workers (r= —.660). Thus classification of counties into 5 groups ac- 


cording to this index means that the counties in a group will be rela- 
tively similar with regard to the variables cited which are highly 
correlated (either positively or negatively) with the first component, 

To arrive at 10 strata, it would have been possible to use 10 class 
intervals of Index I. Instead, a second component index, uncorrelated 
with Index I, was developed and each of the 5 groups determined by 
Index I was subdivided on the basis of counties’ values on Index II. 
Because the components are mutually independent, the control at- 
tained by use of the second component is a net addition to the control 
attained by use of the first component. The correlation coefficients in 
the original matrix were reduced by the amount explained in each by 
the first component. For this reduced matrix, a repetition of the 
processes used in deriving the weights for the first component yielded 
weights to construct an index for the second component. Index values 
were computed for each county and the 5 groups of counties subdivided 
into 10 strata according to county values on Index II. From Table II 
it can be seen that the proportion of farm operators working 100 days 
or more off the farm during a year is most highly correlated with 
Index II, and the importance of livestock next most highly (nega- 
tively). 

In a similar manner, 6 to 12 strata were delineated for each of the 7 
major type-of-farming regions and for the residual group of counties 
not included in any major type-of-farming region. Thus all of the 
counties of the United States (except 14 which have no rural-farm 
population) were grouped into 70 strata, with each stratum including 
counties relatively homogeneous with respect to some 12 agricultural 
or farm population characteristics deemed important for that region. 

The total number of counties to be studied, approximately 70, was 
set largely by administrative considerations. Since it was desired to 
maximize the control afforded by stratification, the number of strata 
to be delineated was also set at approximately 70, with only one county 
from each stratum to be included in the sample. Allocation of the 70 
strata among the 8 major regions was made by a first approximation of 
equal numbers to each region, modified in the light of: (1) variability 
of counties within the region, (2) the region’s share of farm population, 
and (3) the region’s share of agricultural production. 

When the number of strata for a region was set, the next problem 
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was to determine how many component indexes to use and how many 
class intervals of each. The principles generally followed in the present 
stratification were: (1) to use an index for each component explaining 
more than 10 per cent of the total variation of the control variables; 
(2) to use more class intervals for the index of the component which ex- 
plained the greatest proportion of the variation of the control variables. 
For example, in the General and Self-Sufficing Region only the first 
two components (explaining 37.9 and 19.1 per cent, respectively} were 
used, since the third component explained less than 10 per cent of the 
total variation. More class intervals of the index of the first component 
were used than of the second, because it explained approximately 
twice as great a percentage of the variation; however, there was no 
precise guidance on the best allocation of classes between the two. 

During the development of this sample design, some experimental 

work was don? to afford guidance at different steps. However, pressure 
of time did not permit full exploration of the relative advantages of 
alternative methods or procedures. With the increasing use of sampling 
in national surveys, governmental and private, there is great need for 
further research in determining which methods of stratification are 
most efficient for given survey situations. In further exploration of the 
problems of stratification in sampling, the following questions need to 
be worked on: 

(1) For what types of inquiries is the combination of geographic control and 
other control variables more or less efficient than only geographic con- 
trol for stratification? 

(2) If control variables are to be used, when is it more efficient to stratify 
by class intervals of one, two, or three observed variables and when is it 
more efficient to stratify by one, two, or three component indexes based 
on information from a considerably larger number of observed variables? 

(3) If component indexes are to be used, what criteria can be used in deter- 
mination of the number of control variables to be used, the number of 
component indexes to be used, and for each index, the number of class 


intervals? 
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CHARTS SHOULD TELL A STORY 


J. A. Livinestont 
Washington, D. C. 


or any one else to study a set of data in visual form and (2) to 
convey information, which oftentimes means to convey the conclusions 
the analyst reached after studying the chart. It is this second function 
that is the subject of this article. 

The type of chart needed depends on its purpose. When the econo- 
mist or statistician is using a chart professionally, when he is studying 
data, a rough drawing in pencil on graph paper—properly proportioned 
and drawn to scale—will do. Neatness is an incidental value, necessary 
only when observation of minute fluctuations is important. 


pew have two functions: (1) to enable the statistician, economist, 


A MEANS OF POPULARIZING DATA 


But if a chart is to be used to convey information—to put a point 
across—a different standard of draftmanship is in order. In this func- 
tion, the chart is a means of popularizing data; as such it must have 
sales appeal. Getting that sales appeal takes time, effort, and imagina- 
tion. 

One principle can be clearly stated. A chart ought to have form as 
well as substance. When we learned drawing in grade school, one of the 
first things teacher taught was to put a border around the picture. A 
chart is a picture. Yet our diagrams in textbooks, economic and statis- 
tical journals, and even books and magazines of large circulation are 
often formless. 

On the next page is a chart taken from an economic history of 
the United States; it is better than average. You can see at once the 
demands it makes of the reader. The title is at the bottom in extremely 
small type. It’s almost as if the publisher were playing hide and seek 
with the reader. Moreover, the vertical and horizontal lines are too 
numerous, and hence obtrusive. 

BRING OUT THE CURVES! 


Below it is the same chart dressed up. There are fewer vertical and 
horizontal lines to distract the eye. The title is put at the top to declare 
immediately to the world what the chart is all about. The unit of meas- 
urement is tucked away unobtrusively between the right and left bor- 
ders—on both sides for balance; the average reader does not care 


+ The author is indebted to Roy T. Frye for preparation of the charts and to Jerome Jacobson for 
compilation of the data. 
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whether he’s dealing in millions of tons or thousands of barrels per day 
or hundreds of carloads per week. And the heavier curves focus atten- 
tion on the main event. The curve’s the thing. 
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The professional economist or statistician, of course, frequently pre- 
sents charts for his colleagues. Therefore, he emphasizes what he knows 
they are going to look for—the unit of measurement, the source of 

Sales of Retail Stores 
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DIVERGENT TRENDS IN WARTIME SALES 


Retail volume of durable goods—autos, furniture, electrical appliances, etc.—drops 
off; soft goods up. 








































80 | | 80 

60 All Retail Stores —;> an GL) 
o | —_—"— 7) 
& a“ © 
q@ <a 
3 | : 
5 «= Nondurable Goods Stores & 

| wa: 

& 40 a a i490 6 
S -— ° 
4 a) 
me | | 
o | a 

20 | ———4 20 


ac, DUONG Goods Stores 
, | | , 


1939 1940 1941 1942 1943 1944 


Oote: Deportment of Commerce. 
































ATION 
y pre- 
<nows 
rce of 





BILLIONS OF DOLLARS 














CHARTS TELL A STORY 345 


data. The layman is turned away by such details and it’s best therefore 
to get them out of the eye’s way. Instead, the chart should be devised 
so as to draw attention to the central idea. 


HEADLINING THE POINT 


To this purpose, it is useful to employ the headline-writer’s tech- 
nique: Instead of labeling the chart, let the title tell the story. What 
point is to be made? Once you decide that, then make it. 

I'd like to illustrate what I mean. The chart, ‘Sales of Retail Stores,” 
was done in the standard fashion (page 344). It appeared in a govern- 
ment publication and has a matter-of-fact title—a title which means 
much to economists but little to the average person. 

Below it is the same chart dressed up and given the headline treat- 
ment, “Divergent Trends in Wartime Sales.” Note how the words 
“automobiles, furniture, electrical appliances,” give definition to the 
economic term “durable goods” and how that suggests by the logic of 
opposites what the other economic term “nondurable goods” means. 

The graphic technique can be used to get the reader, himself, to 
analyze economic data. In the chart, “War Redeploys American Labor” 
(page 346), the main title points up the problem: it tells the reader what 
to expect. Then each of the grids tells its own story, with the final grid 
summing up. There is a clear progression of ideas, the effect of which 
is to impart movement to static curves. 

The narrative technique requires no fancy diagrams or figures, 
though Viennese or other picturesque diagrams are not excluded. Each 
grid should have a self-contained point, and be more or less related to 
the next grid. A more elaborate example of the technique is the chart, 
“War Bonds—Turnover Makes the Going Tougher” (page 347). Ob- 
serve in this chart the development of the theme—that it’s harder to 
distribute government bonds and keep them distributed! 

The story chart can also be used argumentatively or persuasively. 
The two-page chart, “War Farm Income at All-Time High—Debt, 
Interest, Foreclosures Down,” on page 348, is an example, comprising 
12 grids, designed to show how the farmer’s economic position has 
improved as the result of high production and high prices. This chart 
was used in the second quarterly report of the Director of War Mobili- 
zation and Reconversion. 

Essentially, the narrative chart calls for a comprehensive grasp of 
the data, a lively imagination in selecting and compressing data, and a 
great deal of work in arranging the grids in sequence so as to suspend 
the conclusion to the end. The primary requisite is predigestion of the 
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MILLIONS OF WORKERS 


RATIO OF MANUFACTURING TO 
NONMANUFACTURING EMPLOYMENT 





WAR REDEPLOYS AMERICAN LABOR... 
And raises abasic problem in postwar manpower redistribution 
and readjustment. 






























































1. Manufacturing employment has gone up like this; 
= (Ratio Scole) 48 
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2.Nonmanufacturing has not gone up as sharply; 
48 48 
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3.And instead of 2 workers in nonmanufacturing to | in manufacturing, 
the relationship is now I.4 tol. 
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4. Thus volume of bonds outstand- 
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1, Farm output is at all-time high, 









































3. Result: Gross farm receipts 
ore also at a peak. 
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2. And prices are beyond I929 
levels; 
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4, But acreage harvested is up 
9% since 1939, 
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6. Therefore, farmers totatcosts 
ore 82% above 1939; 
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10. So are (1) average interest 
rates, 



































4 3 
2 2 
OO 





315 ‘20 ‘5 30 3 4 4 





12. With income up and interest 
down, the foreclosure rate is low. 
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material; after that the titles must be imaginatively contrived to head- 
line the story so as to produce a swift, clear and memorable impression 
upon the eye and mind of the reader—lay or professional. 

The titles, incidentally, serve a double purpose. By using them in 
close juxtaposition to the charts, they become an integral part of the 
text and do away with the cumbersome numerical chart references, 
such as “Chart I,” “Chart II,” etc. Also they can be indexed—as in the 
recent Office of War Mobilization and Reconversion reports. 


1 in 
the 
ces 
the 











INTERNAL MIGRATION AND FULL EMPLOYMENT 
IN THE U. &. 


A. J. JAFFE AND Seymour L. WOLFBEIN 
Washington, D. C. 


MAIN FINDINGS 


NTERNAL migration is a necessary, if not sufficient, condition to the 
| ppeereeetie of full employment in the post-war period. Study of 
pre-war and war time developments makes it clear that migration is a 
prime requisite for full employment; only if there were to be radical 
shifts in the geographic distribution of employment opportunities—an 
unlikely development in the opinion of the writers—would the need 
for migration be minimized. 

There has long been a close relationship between the geographic 
distribution of economic opportunity and the demographic factors of 
fertility and internal migration. Obviously, the number of births as of a 
given time helps determine the size of the labor force some two to four 
decades later. If a region’s fertility rate is comparatively high, and if the 
growth in economic opportunity fails to keep pace with the growth in 
its labor force, migration or unemployment become the only two alter- 
natives. 

It is important to note, therefore, that the expected, as well as the 
present regional distribution of economic opportunity in the United 
States, by no means will coincide with the distribution of the potential 
labor force; American industry is largely concentrated now—and will 
continue to be in the immediate future—in those regions where the 
natural increase in the population and the labor force are relatively 
small. Full employment, therefore, will be achieved and maintained in 
the near future only by a redistribution of the population. Only if the 
differentials in the basic population factor (the birth rate) should radi- 
cally change, or the distribution of economic opportunity greatly alter, 
will full employment be possible without continuing migration. 


NEED FOR INTERNAL MIGRATION 


As will be shown more fully below, migration of the population is a 


1 Since this article was prepared the volume Full Employment in a Free Society by Sir William H. 
Beveridge appeared. (W. W. Norton & Co. N. Y. 1945) Among the three necessary conditions for full 
employment Beveridge lists the “organized mobility of labour”—as well as adequate total outlay and 
controlled location of industry. See especially pp. 87 and 171. Cf. Also The Problems of a Changing Popu- 
lation, Washington, 1938, esp. Ch. III. 


351 











352 AMERICAN STATISTICAL ASSOCIATION 


much more likely prospect than a geographic shift in economic oppor- 
tunity—at least during the decade or two following the war. Accord- 
ingly, the accompanying Table 1 was calculated to show how the “nor- 
mal” growth in the labor force during the decade 1940 to 1950 compares 
with assumed job opportunities in 1950, State by State. Essentially, 
the Table summarizes the net effect of regional differentials in fertility, 
labor market growth and economic opportunity. 

If in 1950 the number of civilian jobs were to equal the war time peak 
(monthly average 1943) plus an additional three millions in the armed 
forces, and if each State maintains its “normal” labor force growth, 
there would be a shortage of labor in the New England, East North 
Central and Pacific Divisions. All other Divisions would have an excess. 
If the total number of jobs were as high as the total labor foree—and 
this is the equivalent of almost 60,000,000 jobs—these regional differ- 
ences of course, would still prevail.? 

The estimates presented in Table 1 indicate the latent regional dif- 
ferences in employment opportunity after the war, under assumed con- 
ditions. It is common knowledge, for example, that there has been ex- 
tensive internal migration since 1940* and that the large differences 
among regions noted above may not necessarily be found in 1950. The 
relative position of the different Divisions as shown in Table 1, however, 
is bound to prevail in the long run. This arises out of the knowledge that 
the differential birth rates which were producing differences in the 
rates of growth of the labor force among the various regions in the past 
are still in existence and will continue for some time in the future. 





EXPERIENCE SINCE 1900 


An examination of the history of regional differences in economic 
opportunity and its relation to long time trends in the birth rate and 
internal migration clearly indicates the impact of these demographic 


2 It should be noted that Table 1 gives the benefit of the doubt to each State on two scores: (1) So 
far as the labor force is concerned, the estimate is that of the “normal” labor force, i.e., the number of 
persons 14 years of age and over who can be expected to be workers solely on the basis of population 
growth, modified by such long time trends as the increasing participation of women and the declining 
proportion of workers among the very young and very old. The number in the labor force in each State 
is underestimated by the amount of the abnormal wartime increase in workers that remains in competi- 
tion for jobs after the war. By April 1944 the abnormal increase already amounted to 6,700,000 workers. 
Cf. “Sources of Wartime Labor Supply,” Monthly Labor Review, Aug. 1944. (2) On the job side of the 
ledger, each state has been credited with its wartime total of job opportunities, and with its contribution 
to a 3 million man armed force as well. 

3 Cf. Census Release P-44, No. 17 Interstate Migration and Other Population Changes: 1940-1943, 
by Hope T. Eldridge. It may also be noted however that the “normal” labor force reflects a certain 
amount of migration, e.g., the rural to urban movement which is indicated in the increased labor 
market participation of women. 
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TABLE 1 


COMPARISON OF “NORMAL” LABOR FORCE IN 1950 AND ASSUMED LEVELS OF 
EMPLOYMENT FOR THE UNITED STATES 


By Division and States: all data in thousands 

















_ - Employment <a ™ Empioyment 
Division and — Levels Division and — Levels 
State level State force 
A B A B 
United States 59,170 55,410 | 59,170 | So. At. (cont'd) 
W.Va. 760 630 670 
New England 3,930 4,060 | 4,340 N.C. 1,650 1,320 1,410 
Me. 370 370 400 8.C. 920 740 790 
N.H. 220 210 220 Ga. 1,460 1,180 1,260 
Vt. 150 130 140 Fla. 880 760 810 
Conn. 830 950 1,020 
Mass. 2,000 2,030 2,160 | E. So. Central 4,610 3,650 3,900 
R.I. 360 370 490 Ky. 1,160 800 850 
Tenn. 1,250 1,000 1,070 
Middle Atlantic 13 ,410 12,000 | 12,810 Ala. 1,230 1,080 1,160 
N.Y. 6,640 5,920 6,320 Miss. 970 776 820 
N.J. 2,090 1,940 2,070 
Pa. 4,680 4,140 4,420 | W. So. Central 5,610 4,810 5,140 
Ark. 790 670 710 
E. No. Central 11,670 11,760 | 12,560 La. 1,040 850 910 
Ohio 2,990 3,170 3,390 Okla. 940 750 800 
Ind. 1,440 1,500 1,600 Texas 2,840 2,540 2,720 
Til. 3,570 3,580 3,830 
Mich. 2,330 2,330 2,490 | Mountain 1,730 1,640 1,750 
Wise. 1,340 | 1,180] 1,250] Mont. 240 210 220 
Ida. 210 210 230 
W. No. Central 5,720 4,980 5,320 Wyo. 110 120 130 
Minn. 1,210 1,000 1,070 Col. 470 410 440 
Iowa 1,040 850 910 N.Mex. 220 160 170 
Mo. 1,650 1,530 1,630 Aris. 210 190 200 
N.D. 270 200 210 Utah 220 280 300 
8.D. 270 190 210 Nevada 50 60 60 
Neb. 550 480 520 
Kans. 730 730 770 | Pacific 4,290 5,120 5,460 
Wash. 740 930 990 
South Atlantic 8,200 7,390 7,890 Ore. 470 530 560 
Del. 130 150 160 Calif. 3,080 3 ,660 3,910 
Md. 840 930 1,000 
D.C. 360 580 620 
Va. 1,200 1,100 1,170 


























Employment Level A: Assuming employment equals monthly average 1943 civilian employment plus 
3 millions in the armed forces. Regional and state distribution of employment Levels A and B 
equivalent to that which prevailed during 1943. 

Employment Level B: Assuming employment equals total normal labor force in 1950. 

Normal Labor Force, 1950. Computed as follows: a. The 1940 population of each State was aged 10 years 
by applying survival factors to each age and sex specific group in each of the Northern and Western 
States, and to each age, sex, and color group in the Southern States. b. To these estimated 1950 
populations were applied the 1940 labor force participation rates. c. Each age and sex group in 
each state was.then adjusted to agree with the normal labor force totals for the United States 
presented in “Normal Growth of the Labor Force in the United States, 1940 to 1950,"" by John 
Durand and Loring Wood, Census Series P-44, No. 12. It may be noted that the final adjustment 
was extremely small. The methods described here produced an estimate which was only 2 per cent 
short of the figure arrived at by Durand and Wood. 
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factors upon employment and labor force problems. The experience of 
the years from the turn of the century to 1940 is discussed here; devel- 
opments during the war years are examined in the following section. 
Fertility. Fertility differentials have always been pronounced in the 
United States. In general, the South has consistently had the highest 
fertility and the Northeast the lowest, with the other two regions 
fairly close to the latter (Cf. Table 2). The percentage decline in the 


TABLE 2 


NET REPRODUCTION RATES IN THE U. 8. BY REGION AND COLOR 
1905-10 TO 1935-40* 





Net Reproduction Rates 











Region and Color Per cent Change 
1905-10 1935-40 1905-10 to 1935-40 
U. 8. 134 98 —27 
White 134 96 —28 
Nonwhite 133 114 —14 
Northeast 112 79 —29 
White 113 80 —29 
Nonwhite 61 75 +23 
Northcentral 131 94 —28 
White 132 95 —28 
Nonwhite 71 83 +17 
South 161 118 —27 
White 169 115 —32 
Nonwhite 148 125 —16 
West 117 94 —20 
White 116 93 —20 
Nonwhite t t Tt 





* 16th Census of the U. 8. Population, Differential Fertility 1940 and 1910. Washington, 1944, 
Table 7. Net reproduction rate of 100 means that each generation would just replace itself, if birth and 
death rates of a given period were to continue indefinitely, in the absence of net migration. A rate above 
100 implies a potentially increasing population; a rate below 100 a potentially declining population. 

t Too few cases to afford reliable rates. 


net reproduction rate since the turn of the century was almost identical 
(just short of 30 per cent) in all of the regions except the Far West 
which experienced a decline of about one-fifth. For the last forty years 
therefore, there has been no basic change in the pattern of geographic 
differentials in fertility. Consequently, in the absence of migration, the 
South would have had the fastest growing labor force throughout this 
period. 

Labor Force Growth. Actually, the South has had a larger number of 
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annual additions to the labor force than any other region in the coun- 
try. In fact the pattern of fertility differentials described above is 
clearly reflected in Table 3 which summarizes the data on regional dif- 
ferences in potential labor force entrants. The table shows the number 
of boys 9 to 13 years of age (all of whom would attain the labor force 
age of 14 in a period of five years) expressed as a ratio of male gainful 
workers in 1920 and 1930, and as a ratio of male employment in 1940. 

The South consistently has taken first place in the size of male po- 
tential labor force entrants. And, as might be expected from the dis- 


TABLE 3 


REGIONAL DIFFERENTIALS IN MALE POTENTIAL LABOR FORCE ENTRANTS 
1920, 1930, 1940 








Ratio of males 9-18 years to male gainful workers* 





Ratio Rank 
Division 1920 1930 1940 1920 1930 1940 
U. 8. — _ _ 
New England -141 .155 .164 5 4 6 
Middle Atlantic .147 .149 .151 4 5 8 
East North Central -130 -129 . 156 8 8 7 
West North Central .139 .132 . 166 7 7 5 
South Atlantic -181 -171 .197 2 1 3 
East South Central -191 .170 .217 1 2 1 
West South Central -180 . 157 .202 3 3 2 
Mountain -140 .142 .194 6 6 4 
Pacific .099 .098 -128 9 y 9 





* For 1920 and 1930 ratio was calculated on basis of all male gainful workers; for 1940 figures 
represent males 9-13 as a proportion of male employment. 


cussion on fertility, the Pacific Coast has just as consistently had the 
lowest proportion of potential new workers. Table 3 actually conceals 
some very sharp differences which exist among the several States. 
Even as early as 1920 male potential labor force entrants ranged from 
a high of 22 per cent in South Carolina and 21 per cent in Mississippi 
to a low of less than 10 per cent in California, District of Columbia, 
and Nevada. This range has been maintained through 1940.‘ 
Migration. The close correspondence between the regional pattern 
of fertility differentials and potential labor force entrants should have 
brought about the largest labor force growth in such places as the South 
and the West North Central Division; the smallest growth in the num- 
‘4 This, of course, is in part at least, a reflection of urban-rural differences. Along this line ef. Re- 


placement Rates For Rural-Farm Males Aged 25-69 Years, By Counties, 1940-50 by Conrad Taeuber. 
U. 8. Dept. of Agriculture Dec. 1944, 








356 AMERICAN STATISTICAL ASSOCIATION 


ber of workers might be expected in the Northeast and Far West. In 
actual fact this was not the case. That the labor force did not grow as 
fast as might be expected in certain regions was due, of course, mainly 
to migration—the regional trends in which are summarized in Table 4. 

During the decade 1920 to 1930 there was a net out-migration of 
about one and a third millions from the South, while the Far West 
experienced a net in-migration of approximately the same size during 


TABLE 4 
NET CIVILIAN MIGRATION 1920-30, 1930-40, 1940-43 











Region and Division Average No. of Migrants Per Year Average Yearly Migration Rates* 








1920-30 1930-40 1940-43 1920-30 1930-40 1940-43 

North East + 15,600 + 19,900 — 8,200 + .5 + .6 — .2 
New England — 31,400 — 2,000 + 33,300 — 4.0 —- .2 + 4.0 
Middle Atlantic + 47,000 + 21,900 — 41,500 + 1.9 + .8 — 1.6 
North Central — 20,000 — 69,500 — 78,300 — .6 -— 1.8 — 2.0 
East North Central + 67,000 — 1,800 +154,400 + 2.9 — .l + 5.9 
West North Central — 87,000 - 67,700 -—232,700 — 6.7 — 6.1 —18.1 
South —133,400 — 85,800 —299,300 - 3.8 — 2.2 - 7.3 
South Atlantic — 65,400 + 10,400 + 36,200 — 4.4 + .6 + 2.0 
East South Central — 58,600 — 42,600 -—178,900 — 6.2 — 4.1 —17.2 
West South Central — 9,400 — 53,600 —156,600 —- .8 — 4.3 —12.4 























Far West +137,700 +135,400 +470,400 +13.2 +10.5 +32.6 
Mountain — 23,900 + 5,100 — 14,900 — 6.8 + 1.3 — 3.7 
Pacific +161,600 +130,300 +485,300 +23.5 +14.5 +46.9 








* Per Thousand. Based on the average population at the beginning and end of the period. 
Adapted from Eldridge, Hope T. op. cit. and Shryock, H. 8. “Internal Migration and the War,” 
Jour. Am. Stat. Assn., March 1943. 





this period. During the following decade, 1930 to 1940, the South 
continued to export people, having a net out-migration of a little under 
one million. In general, the basic pattern of a net movement out of the 
high fertility areas into areas of lower fertility was observed during the 
two decades following the last war. This resulted in the labor force 
growing at a slower rate than would be expected in regions of high fer- 
tility, with the reverse in the lower fertility regions. 

Economic Opportunity. It goes without saying that the high fertility 
areas have experienced net out-migration not simply because they were 
regions of high fertility, but also because there was better economic 
opportunity elsewhere. 

The regional distribution of employment opportunities is perhaps 
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best illustrated by the material presented in Table 5, which shows the 
concentration in manufacturing wage jobs from the turn of the century 
to the beginning of World War II. As far back as 1899, the North al- 
ready accounted for more than four out of every five manufacturing 
wage jobs; the proportion was just about the same in 1937. Secondary 
areas of concentration have appeared in the Southern Piedmont and in 
the Far West, but in the words of the National Resources Committee 
“the gross pattern of industrial location was already established 50 
years ago. The development of these years have modified but in no way 
reshaped this pattern.”® 


TABLE 5 
DISTRIBUTION OF MANUFACTURING WAGE JOBS IN 200 INDUSTRIAL COUNTIES 





Per cent Distribution 








Industrial Counties in: 1899 1929 1937 
Northeast 86.61 83.00 82.04 
New England 20.68 14.77 14.54 
Middle Atlantic 43.90 37.06 36.40 
East North Central 22.03 31.17 31.10 
South 6.26 8.59 10.02 
South Atlantic 3.40 4.86 6.17 
East South Central 2.57 2.83 2.91 
West South Central -29 -90 .94 
West 7.13 8.41 7.94 
West North Central 4.50 3.80 3.43 
Mountain -28 -25 21 
Pacific 2.35 4.36 4.30 





National Resources Committee, Structure of the American Economy, Part I. Basic Characteristics. 
1939. Cf. also Is Industry Decentralizing?, by Daniel Creamer. Phila. 1935 and Growth of American 
Manufacturing Areas, by Glenn E. McLaughlin, Phila. 1935. 


With few exceptions, economic opportunity in the service trades and 
other non-manufacturing and non-agricultural pursuits, will parallel 
opportunity in manufacturing and agriculture. Hence, there is little 
likelihood of economic development occurring in a region unless there is 
first a development in manufacturing or agriculture. Mining, oil pro- 
duction, transportation, and the resort and recreation industries (indus- 
tries based on natural resources or the configuration of the earth’s 
surface) are the major non-manufacturing, non-agricultural pursuits 
which can be located geographically apart from the areas of manu- 


5 The Structure of American Economy part I. Basic Characteristics. 1939. Cf. also footnote to Table5. 
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facturing and agriculture. These industries, however, include but a 
very small proportion of the total number of jobs. 

Thus, the areas of greatest opportunity have coincided with regions 
of lowest fertility and smallest natural labor force growth. It is not 
surprising, therefore, that the heaviest migration has been from the 
high fertility areas which essentially embrace the predominantly agri- 
cultural South.® It is all the more understandable when one considers 
the fact that agricultural employment is by no means the equivalent 
of economic opportunity in every case. Often it has merely reflected a 
damming up on the farms—disguised unemployment. Dr. Taeuber’s 
estimate based on 1940 Census data of a total of more than two and a 
half million farm operators working on units deemed to provide less 
than full time employment is pertinent.’ 


THE WAR PERIOD 


The years of World War II are witnessing no great changes in the 
basic patterns described so far. They can be summarized by stating that 
the various trends and geographic differentials of the first part of the 
twentieth century have, if anything, become more marked. Certainly 
the rate of migration was stepped up considerably, over the 30’s, as 
was the rate of growth in economic opportunity. 

Since 1940 geographic differentials in fertility have been maintained. 
This can be seen from the correlation of .90 between the crude birth 
rates in the 1940 and 1943, by States. Percentage changes in the birth 
rate since 1940 have been substantially the same in various regions of 
the country. 

As for the pattern of internal migration of the civilian population— 
this too has remained substantially unaltered (See Table 4). The rate 
of migration was speeded up considerably; but the basic pattern of 
movement remained unchanged; it was out of the high fertility rural 
areas of the South and the West North Central Division, into the lower 
fertility and industrialized areas of the North East and Far West. 

Indeed, it can be said that much of the industrial growth which took 
place during the war period was made possible by this very extensive 
shift of population.* War industry expanded most rapidly in just those 

6 See e.g., Volume and Composition of Net Migration From the Rural Farm Population 1930-1940, 
by E. H. Bernert. U. 8. Dept. of Agriculture, January 1944. 

7 In “Agricultural Underemployment” Rural Sociology, Dec. 1943. 

® The role of the migrant in large war centers can be seen from the Congested Production Area 


reports of the Bureau of the Census (Series CA-3). See also Howard B. Myers, “Defense Migration and 
Labor Supply,” Jour. Am. Stat. Assn., March 1942. 
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areas which had the smallest excess labor supply, and only the move- 
ment of workers from the surplus areas permitted the tremendous ex- 
pansion required by the war. 

During the war years, therefore, the areas of economic opportunity 
continued to be the areas of economic opportunity of the previous 
decades. New plant facilities and primary war contracts were dis- 
tributed regionally in much the same manner as industry had been dis- 
tributed before the war. Further, much of the industrial expansion 
which did occur in previously little industrialized areas was generally 
of the strictly war type—aircraft, munitions, shipbuilding—the indus- 
tries which appear unlikely to be maintained in the post war period 
on a scale comparable to their present level of activity.® 


THE POST WAR PERIOD 


On the basis of the above historical picture it is possible to outline 
the probable trends during the post-war period. 

First, it appears highly unlikely that current fertility differentials 
will soon disappear. The socio-economic and psychological factors 
which determine the birth rate change slowly. Only under very unusual 
conditions such as famine, pestilence or war is the fertility rate likely 
to change radically within a short period of time. It appears extremely 
unlikely that any such factors will affect any one region in this country 
to the exclusion of others, and the prediction can be made safely that 
present regional differentials in fertility will continue largely unchanged 
for some time in the future. 

Under these conditions the rate of labor force growth will continue to 
show approximately the same regional differences which were observed 
up to the beginning of the war, and which are bound to prevail during 
the present decade. Thus, in the absence of internal migration the labor 
force in the Pacific Division would have grown by but 4 per cent during 
the period 1940 to 1950, whereas the expected increase would have 
been as high as 18 per cent in the South Atlantic and East South 
Central Divisions (See Table 1). 

Migration since 1940 already has redistributed workers from regions 
of potentially greater labor force growth to regions of lesser growth. 
But the areas which have produced an excess labor supply up to now 


* The correspondence between prewar and wartime trends in new labor force entrants, migration, 
and economic opportunity is documented in the writers’ “Post War Migration Plans of Army Enlisted 
Men,” Annals, Academy of Social and Political Sciences, March 1945. 
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will continue to do so for some time to come.’® In other words, the 
growth of the labor supply—especially in the South—will continue to 
outstrip the growth in economic opportunity. In the North and Far 
West, on the other hand, the rate of growth in economic opportunity 
(assuming the Nation experiences approximately full employment) 
will proceed at a faster pace than the rate at which the population is 
producing new labor force entrants. With widespread unemployment, 
of course, all regions would suffer, but those with relatively low normal 
rates of labor force growth would still be at a relative advantage. 

Furthermore, it appears highly unlikely that under present condi- 
tions we will witness extensive shifts in the regional pattern of eco- 
nomic opportunity. That the geographic distribution of industry has 
not changed materially since the turn of the century already had been 
noted. Comparatively small shifts have taken place: The TVA has 
brought considerable industry into an area largely devoid of employ- 
ment opportunity in manufacturing; the war period has seen the con- 
struction of much industrial plant in previously agricultural areas. 
However, they all add up to relatively small deviations in the general 
picture." 

In summary, then, it appears that large segments of the American 
labor force will continue to be reared in areas where economic oppor- 
tunity will be comparatively small. At the same time, there is little to 
suggest that greatly increased numbers of jobs will become available 
in areas of excess labor supply. It follows that the migration of the 
population will be essential if all are to have jobs. The process of in- 
ternal migration will have to continue until such time when, for each 
region, the rate of labor supply growth is in balance with the rate of 
growth in job opportunities. 


IMPLICATIONS OF A FLUID MIGRATION POLICY 


For the benefit of the Nation as a whole it is obvious that a policy 
maximizing migration opportunities is best. It is pertinent, therefore, 
to note briefly some of the factors which wou!d implement such a policy 
in the immediate post war period: 


10 The correlation, by States, between the expected per cent increase in the labor force 1940 to 
1950, and the average yearly crude birth rate for the period 1940 to 1943 was found to be .76. Thus, 
the States which would normally have the largest labor force growth are, in general, still having the 
highest birth rates. 

1! For a more extensive discussion see also Warren 8S. Thompson, War and Post-war Population 
Shifts in the U. S., Institute on Post War Reconstruction, N. Y. University, Series III, No. 12, May 17, 
1944. 
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1. Adequate information relative to the whereabouts of economic 
opportunity. Recurrent statistics on the regional distribution of job 
openings and business and industrial opportunities, must be available, 
particularly during the immedate post-war period. Significant changes 
in the pattern of industrial production coupled with the demobilization 
of the armed forces, will set millions of persons seeking new jobs once 
the war is over. A knowledge of these basic changes in the labor force 
and differential regional economic opportunity will be necessary in 
order to develop a sound post-war structure of full employment.” 

2. An adequate nation wide employment service. The development 
of the United States Employment Service has gone far toward bringing 
the man and the job together; in the post-war period the necessity of a 
well functioning USES will be at least as great as during the present 
war period.” 

3. Handling of unemployment compensation and relief on a national 
basis. Mobility has been restricted in the past by requiring a specified 
length of residence in a community before relief could be granted, thus 
tending to inhibit workers from migrating since they could expect no 
relief if they were unable to obtain employment immediately. Progress 
towards the uniform application of Social Security benefits among the 
various States is diminishing the importance of this factor." 

4. The elimination of State legislation discouraging the recruiting 
of workers for employment outside the State. Roback, writing in 1943, 
found ten States with emigrant agency laws which discouraged the out- 
migration of workers by such devices as “ ... licenses and fees, ex- 
cessive in amount... levied for the privilege of soliciting workers to 
be employed beyond the limits of the State.”“ 

5. Provision of more adequate housing and community welfare serv- 
ices. The war period has seen magnified the effects of inadequate hous- 
ing, health, educational and recreational facilities."* 

12 For a more extended discussion of the types of data which can be made available see Statistical 
Requirements In the Readjustment Period prepared by the Division of Statistical Standards, Bureau of 
the Budget (Nov. 1, 1944). Cf. also J. C. Capt, The Work of the Census Bureau in 1945 and How it will 


help Marketers, address delivered before the Wartime Conference of the American Marketing Associa- 
tion, Chicago, Ill., Dec. 1, 1944. 

% See Robert K. Lamb, “Mobilization of Human Resources,”” American Journal of Sociology, Vol. 
XLVIII, No. 3, Nov. 1942. 

4 See Problems of a Changing Population, op. cit., p. 118 and Webb and Brown, Migrant Families, 
Research Monograph XVIII, WPA, 1938. Cf. also “Unemployment Compensation in the Reconversion 
Period: Recommendations by the Social Security Board,” Social Security Bulletin, Oct. 1944. 

18 Herbert Roback, “Legal Barriers to Inter State Migration,” Cornell Law Quarterly, Vol. 28, No. 3, 
March 1943. 

16 Cf. Lamb, op. cit. and Homer Hoyt, *The Structure of American Cities in the Post War Era” 
American Journal of Sociology, Vol. XLVIII, No. 4, Jan. 1943, pp. 475-481. 
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It is recognized that such a migration policy will—as it has for a long 
time in the past—result in the movement of wealth from the areas of 
out-migration to the regions of in-migration. As a matter of fact, the 
State which has been the largest recipient of migrants has clearly recog- 
nized that situation, as can be seen from the following statement: 

“California got a bargain when 1,300,000 new people came here for war 
jobs. Most of the newcomers were under 40—but over 20. Other states paid 
for their upbringing and education. California reaps their skills, their pro- 
ductiveness, their children, and their vitality. ... Life may begin at 40 
for many individuals but cold economic fact shows that society obtains the 
greatest benefits from the average person between the time he reaches 20 
and his fortieth birthday. . . . California’s vitality and progressiveness have 
always been due, in part, to the constant inflow of just such vigorous young 
people. ”?” 


This matter is doubly important if one recognizes that much of the 
movement is from areas of lower educational levels to areas of higher 
educational levels. In most instances, however, the migrant has been 
educated as well as his home community could afford; his educational 
opportunities could have been improved only with the aid of outside 
assistance. And as long as these areas continue to lose population they 
will continue to sustain economic losses and be unable to bring up 
future generations under the best circumstances. Increased Federal 
aid for educational and child welfare purposes would appear to be in 
order. 

It is beyond the scope of this paper to go into many other issues 
which are related to the economic unbalance between regions, and 
which lead to migration. Among the variety of related factors and 
issues are: such national economic policies as the tariff and export sub- 
sidies; policies of private industry related to the development of such 
industries as steel in the South and West; the nation’s freight-rate 
structure; availability of natural resources; wage differential structures; 
barriers (often psychological) leading to a hesitancy on the part of out- 
side industry to move into a new area; differentials in availability of risk 
capital; differentials in the availability of entrepreneurs and experi- 
enced managers; policies affecting the utilization of government-owned 
plants in the post-war period; and National-State development of 
water power resources, waterways, and navigation. All of these, as well 
as many others, will explain why labor must migrate. It is extremely 


17 How Many Californians? California State Reconstruction and Reemployment Commission, 
Sacramento, July, 1944. 
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important to note, therefore, that any attempt to stop or slow down 
migration without simultaneously adversely affecting the national wel- 
fare must overcome the forces making for differential economic oppor- 
tunity in the United States. 








ATTRITION LIFE TABLES FOR THE SINGLE POPULATION 


Witson H. GRaBILu 
Bureau of the Census 


EOPLE in many fields of research are interested in the proportion 
Pe men and women who ever marry. Sometimes population census 
data on the martial status of the population have been used to obtain 
a rough indication of this proportion.’ Such data, however, are none too 
precise when used for that purpose because it is difficult, if not impos- 
sible, to make an adequate allowance for such factors as possible mis- 
statements of age by older persons, varying marriage rates in the years 
prior to the census, and the higher mortality among single persons than 
among married persons. Of these factors, differential mortality is per- 
haps the most serious. In an effort to throw more light on the problem 
of determining the proportion of men and women who ever marry, 
attrition life tables? for the single population have been constructed. 
These tables show how a population of single persons is diminished by 
death and by marriage as they pass through life. From such a table a 
more precise measure of the proportion of single persons who ever 
marry can be obtained, and also information on the average number of 
years of life as a single person remaining at each age, that is the average 
number of years before either marriage or death occurs. 

The proportion of persons who will ever marry depends on mortality 
rates as well as on marriage rates. The chance for eventual marriage 
increases as a person passes through the period of infancy and child- 
hood and leaves behind the risk of dying before attaining a marriage- 
able age (Figure 1). The maximum is reached somewhere in the ’teen 
age range. After that age, the probability that a single person will ever 
marry declines, slowly at first and then faster and faster as the ages 
at which most people marry are passed and the remaining single popu- 
lation becomes heavily weighted with confirmed bachelors and spinsters. 

According to the life tables presented here, 85.4 per cent of newborn 
males and 87.9 per cent of newborn females may expect to marry dur- 


1 For examples, see: H. Westergaard, “The Study of Displacements within a Population.” Jour. 
Am. Stat. Assn., 1920, Vol. 17, p. 381. 

8. D. Wicksell, “Nuptuality, Fertility, and Reproductivity,” Skandinavisk Tidskrift, 1931, p. 125. 

Metropolitan Life Insurance Co., “The Chances of Marriage,” Statistical Bulletin, Vol. 23, No. 5, 
May, 1942, p. 4. (See also “The Chances of Remarriage for the Widowed and Divorced,” Ibid., Vol. 26, 
No. 5, May, 1945. This article is based on marriage records in part, rather than on census data exclu- 
sively.) 

2 The term “attrition” life tables is employed here to distinguish the tables from the usual types of 
life tables wherein death rates rather than marriage and death rates are the sole source of reduction of 
the population in successive years of age. Insofar as marriage may be considered as a special kind of 
mortality of the single population the actuarial term “multiple decrement” life tables would be equally 
applicable. 
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ing the course of their lives, on the basis of “normal” marriage rates for 
the period 1920 to 1939 and death rates for single persons in 1940. The 
proportion who will every marry reaches a maximum for females at the 
age of 16 years (93.5 per cent) and for males at the age of 19 years 
(92.7 per cent). At age 65, the chances for eventual marriage are only 
two out of 100 for bachelors and less than one out of 100 for spinsters. 

The complete expectation of life for a newborn male is 61.4 years, 
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but his expectation of life as a single person is only 25.4 years. For new- 
born females, the corresponding figures are 65.6 years and 23.7 years, 
respectively.’ In other words, the average male infant may expect to 
spend almost 42 per cent of his lifetime as a bachelor and the average 
female infant may expect to spend about 36 per cent of her lifetime as a 
spinster—if the terms “bachelor” and “spinster” can be stretched to 
cover the periods of infancy and childhood as well as adulthood. 

* The figures cited for the total population are weighted averages of corresponding figures on the 
expectation of life for white and nonwhite persons; the weights are the color distribution of the popula- 


tion in the 1940 census. For the basic figures, see Bureau of the Census, “United States Life Tables, 
1939-1941,” Vital Statistics—Special Reports, Volume 19, Number 4. 
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The expectation of life as a single person reaches a low of just under 
eight years for people in their early twenties (Figure 2). At this point, 
marriages are most numerous and the population of single persons suf- 
fers heavy losses. Those who do not marry in their early twenties are 
subject to the lesser marriage rates of older ages, so that their expecta- 
tion of life as single persons temporarily increases. For females in their 
early forties, the expectation of life as a single person is higher, even, 
than for newborn infants. After the early forties the expectation of life 
as a single person once more declines and it becomes more and more 
nearly the same as the complete expectation of life. At age 65, for in- 
stance, the expectation of life as a single person is 13.5 years for fe- 
males, whereas the complete expectation of life is 13.6 years. 

The life tables presented here refer only to single persons, not to mar- 
ried, widowed, and divorced persons. A single person is one who has 
never married. Marriage is considered as a special kind of “mortality,” 
insofar as the population of single persons is concerned. 


SOURCES OF DATA 


The attrition life tables for single males and single females have been 
constructed from the age-specific marriage and death rates shown in the 
first two columns of each accompanying life table, and from similar 
rates, which are not shown, for ages over 65 years. 

Marriage rates. The marriage rates were based partly on data on 
number of first marriages, by age of the bride and of the groom, as re- 
ported for the year 1940 by 23 States,‘ and partly on 1940 census data 
for the population classified by marital status and age.’ More precisely, 
provisional marriage rates for single males under 23 years old and for 
single females under 22 years old were computed from 1940 census data 
and provisional marriage rates for older single persons were computed 
from the marriage data for 23 States. The 1940 census data were sub- 
stituted in the computation of marriage rates for young people as the 
marriage transcripts for the 23 States were deficient in number of per- 
sons under the legal age of consent and the transcripts seemed to be 
slightly too numerous for persons in their early twenties, on account of 
misstatements of age. For instance, only 264 marriages were reported 
for brides 10 to 14 years old in these 23 States, whereas the 1940 census 

4 Bureau of the Census, “Marriage Statistics—Resident Brides and Grooms by Age, Collection 
Area, United States, 1940,” Vital Statistics—Special Reports, Volume 17, Number 9. The collection 
area in 1940 comprised 28 States, but only 23 States reported on previous marital status of the bride 
and the groom. In using these data, the small numbers of persons of unknown age and of persons of 
unknown prior marital status were distributed prorata among the numbers of known age and marital 


status. 
5 Bureau of the Census, Sizteenth Census reports on Population, Volume IV, Part 1. 
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showed 895 married females 14 years old in these 23 States. The median 
age at first marriage for females, as computed from the marriage tran- 
scripts, was 22.8 years, whereas 1940 census data obtained in response 
to a direct query on age at first marriage showed a median of 21.6 
years.® (The median age at first marriage, computed from column 6 of 
the accompanying life table for females, is 21.8 years, a figure which 
agrees well with the 1940 census figure.) 

The provisional marriage rates described in the preceding paragraph 
were further adjusted so that they would be more nearly representative 
of “normal” marriage conditions. In the year 1940 there was an excép- 
tionally large number of marriages, because of improved economic con- 
ditions and because of the large number of persons who had deferred 
marriage during the late depression.’ Attrition life tables based on mar- 
riage rates for 1940 would not have been representative of the normal 
expectation of marriage during one’s lifetime. Accordingly, the com- 
bined series of provisional marriage rates were modified slightly so that 
they would be consistent with the “normal” number of first marriages 
for 1940, or the number which would have been anticipated for 1940 
on the basis of the size of the population in the marriageable ages and 
of average annual marriage rates for the period 1920 to 1939. 

Mortality rates. The mortality rates shown in the second column of 
each table were largely based on the results of a recent tabulation of 
number of deaths reported for single persons in 1940.* Death rates for 
single persons over the age of 20 years were computed from these statis- 
tics. Since the tabulation did not contain sufficient age detail for the 
accurate computation of rates for single persons under the age of 20 
years, death rates for the younger ages were substituted from 1939- 
1941 United States life tables for persons without regard to marital 
status.* The substitution is justified on the ground that persons less 
than 20 years old are so largely single that the death rates for all persons 
in this age range are virtually the same as those for single persons. At 
the age of 20 years, for instance, the death rate per 1,000 single males 
was 2.49 while that for all males was 2.47. 


EXPLANATION AND DISCUSSION OF LIFE TABLE VALUES 


Attrition rates. The attrition rates are the marriage and death rates 
shown in the first two columns of the accompanying life tables. In the 


6 Bureau of the Census, “Age at First Marriage,” Pupulation, Release Series P-45, No. 7. 

7 Bureau of the Census, “The Wartime Marriage Surplus,” Population, Release Series PM-1, No. 3. 

8 Bureau of the Census, “Mortality by Marital Status, 1940,” Vital Statistics—Special Reports, 
Volume 23, Number 2. 

® “United States Life Tables, 1939-1941,” op. cit. 
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standard life table, a special kind of mortality rate is used, identified by 
the symbol 1,000 qg., in which the denominator for each rate is the 
population at the beginning of the year of age rather than the popula- 
tion within the year of age. Although no such symbols are given in the 
life tables shown here, the first two columns may be considered as com- 
ponent parts of the “mortality” rate for each age—that is, their sum 
corresponds to the 1,000 g. column of a standard life table. 
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Despite the fact that more than half of all males who ever marry 
do so by the age of 25 years, and females by the age of 22 years, the 
maximum marriage rates for single persons do not occur until a slightly 
older age, 27 years for males and 23 years for females (Figure 3). These 
maximum marriage rates, however, do not result in maximum numbers 
of brides and grooms at these ages, because the number of eligible single 
persons has already been depleted by marriages at younger ages. 

The trend in the death rate by age and sex for single persons is shown 
in Figure 4. In general, death rates are higher for single persons than 
for others in the adult ages. The higher death rates for single persons 
may be, in part, a consequence of a slightly larger proportion of physi- 
cally unfit people among bachelors and spinsters, and in part it may be 
due to the fact that many single persons, particularly males, are in- 
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clined to be careless in regard to their eating, sleeping, and other living 
habits. In any event, the differentials in mortality by marital status 
seem to be less for females than for males. The following table shows 
how the death rates for single persons of selected ages compare with the 
corresponding rates for all persons in the period: 1939-1941. 

AVERAGE NUMBER OF DEATHS IN YEAR OF AGE PER 1,000 PERSONS LIVING 


AT BEGINNING OF YEAR OF AGE: UNITED STATES 
By Sex and Selected Ages, for Single Persons, 1940, and All Persons, 1989-1941 




















Male Female 
Age , 
Single, 1940 Total, 1939-1941 Single, 1940 Total, 1939-1941 
20 years 2.49 2.47 1.71 1.89 
30 years 5.18 3.38 3.32 2.75 
40 years 9.44 5.93 5.05 4.48 
50 years 17.21 12.60 9.04 8.71 
60 years 33.85 26.32 18.51 18.12 
70 years 62.30 54.78 42.12 42.76 
80 years 125.67 123.91 120.36 107 .07 











Number alive and single at the beginning of year of age. Column 3 in 
each life table is analogous to the 1, column in a standard life table. 
It shows the number of single persons expected to be living at each 
birthday out of 100,000 born alive. The column shows particularly great 
reductions (through marriage) in the numbers of single persons between 
the ages of 19 and 30 years. 

Number dying in year of age while single and number marrying for the 
first time. Columns 4 and 5 in each life table are together analogous to 
the d, column in a standard life table. They show the number of losses 
from the single population in each age interval as a cohort of 100,000 
infants passes through life. The figures in the two columns may be com- 
pared for the relative importance of losses through death and through 
marriage. Marriages are more numerous than deaths for single females 
between the ages of 14 and 49 years, and for males between the ages of 
16 and 49 years. 

The figures in column 6 have no counterpart in any column of a 
standard life table. The column is useful as a guide to the number of 
single persons expected to marry during the remainder of their lives. 
The column is an essential intermediate step in the computation of the 
percentage of single persons who will ever marry. 

Stationary population. Columns 7 and 8 correspond to Lz and T; 
columns, respectively, in a standard life table. Column 7 shows the 
number of years lived in any one year of age by the survivors of 100,000 
births as they pass through the single period of life. Each survivor who 
passes through that age lived one year in the given year of life but, since 
not all of the cohort lived throughout the year of age, the number of 
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years lived in any one year of age by the cohort as a whole is intermedi- 
ate between the number of survivors at the beginning and end of each 
year of age. Column 7 might also be interpreted as the single popula- 
tion by age which would be built up from a constant flow of 100,000 
births per year, no migration, and no change in the mortality and mar- 
riage rates. Similar interpretations may be made for the figures in col- 
umn 8, which consist of accumulations of the figures in column 7. For 
instance, Table 2, column 8 indicates that under the given mortality 
and marriage conditions, the survivors of 100,000 female births would 
live an aggregate of 2,371,968 years as single persons before the last 
survivor died or married. Or this might be interpreted as a population 
of 2,371,968 single females of all ages which would eventually result 
from a constant 100,000 female births per year, no migration, and no 
change in the mortality and marriage rates. 

Percentage of single males and females marrying in year of age and all 
later years. The figures shown in column 9 were derived by dividing the 
accumulated number of marriages (column 6) by the corresponding], 
figure for that age (column 3). The figures shown in column 9, of course, 
have no counterpart in a standard life table. They represent the per- 
centage of single people living at a given age who will ever marry during 
the remainder of their lives. 

Average number of years remaining before marriage or death. Column 10 
corresponds to the é, column in a standard life table. The average num- 
ber of years remaining before death or marriage was computed by divid- 
ing the 7. figures shown in column 8 by the corresponding 1, values in 
column 3. 

Reliability of the tables. The attrition life tables are representative of 
the potential effect of marriage rates at the 1920-1939 “normal” level 
and of the deaths at the 1940 level in the population of single persons. 
The tables are useful primarily as a basis for generalizations as to the 
approximate proportions of single persons who will ever marry, subject 
to variations of a few per cent in the case of predictions. The tables are 
not so accurate as a basis for predicting the absolute number of single 
persons who will be living and unmarried at some postcensal date, be- 
cause marriage rates are so variable in magnitude from year to year. 
The number of marriages which will occur is contingent on many things, 
such as the number and the age distribution of eligible single persons in 
the population, the sex ratio, the psychological temperament of the 
people, and the amount of unemployment. Most of these factors affect 
young adults to a greater extent than they do older persons. 


COMPARISON WITH DATA FOR FRANCE 


Similar calculations, even more detailed than those shown here for 











ATTRITION LIFE TABLES 371 


the United States, have been made for France and also for other coun- 
tries.!° The French tables show such characteristics as the probability 
that a marriage will be terminated by divorce or by death of the hus- 
band or wife after a given number of years of marriage, the median 
age at marriage of single, widowed, and divorced persons in the poten- 
tial stationary population, and the complete expectation of life for 
persons classified by marital status, as well as the probability of mar- 
riage for single persons during the course of their lives. The following 
table shows how the proportion of the single population expected to 
marry compares at selected ages for France and for the United States. 
PERCENTAGE OF SINGLE PERSONS EXPECTED TO MARRY 


DURING THE COURSE OF THEIR LIVES 
By Selected Ages, for the United States, 1940, and for France, 1931 














UNITED STATES 1940 FRANCE 1931 
Exact Age 

Male Female Male Female 
O years (at birth) 85.4 87.9 74.0 77.9 
5 years 91.1 92.7 83.9 86.4 
10 years 91.7 93.1 84.8 87.3 
15 years 92.2 93.5 85.6 88.1 
20 years 92.6 92.1 86.8 86.6 
25 years 88.0 78.5 79.6 70.1 
30 years 72.3 55.3 60.5 46.5 
35 years 49.7 34.3 42.4 29.1 
40 years 31.7 20.2 28.4 17.3 
45 years 19.1 11.3 17.9 10.0 
50 years 11.1 6.1 11.0 5.5 
55 years 6.2 3.2 6.6 2.8 
60 years 3.3 1.6 4.1 1.5 
65 years 1.9 0.8 2.2 0.6 

















The somewhat higher proportions shown for the United States than 
for France are partly a result of lesser mortality among single persons 
in the United States in 1940 than in France in 1931, partly a result of 
differences in social customs which affect age at marriage, and perhaps 
also a result of differences in methodology and in completeness of the 
basic data. An examination of the census returns for the two nations 
reveals, however, that among males under the age of 30 years and 
among females at all ages a larger proportion of the population was 
single in the case of France in 1931 than in the case of the United States 
in 1940. 

10 Pierre Dupoid, “Tables nouvelles relatives & la population francaise,” Bulletin de la Statistique 
Générale de la France, Vol. XXVII, Part II, Jan.-March, 1938, pp. 269-324. 

M. 8. Somogyi, “Tavole di nuszialita e di vedovansa per la italiana, 1930-1932,” Annali di statistica, 
Series VII, Vol. 1, pp. 199-292. 


M. H. Gastineau-Hills, “Probabilities of Marriage in Australia,” Actuarial Society of Australia, 46th 
Session, 1941. 
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A CHART OF THE x? AND t DISTRIBUTIONS* 


James F,. Crow 
Dartmouth College 


N USING the chi-square and “Student’s” distributions it is often desirable to ob- 

tain an estimate of the probability associated with a certain value of x? or 
which falls between two tabulated values. Interpolation between these values 
usually requires special methods which are not always satisfactory for routine 
work. Various charts and nomographs have been prepared which make simple 
graphical interpolation possible.!:?.4 


About two years ago a chart of the x? and ¢ distributions was prepared by the 
author for research purposes and was reproduced for use in classes at Dartmouth 
College and Dartmouth Medical School. This chart, which is shown in the figure 
in reduced size, eliminates problems in interpolation and gives probabilities with 
sufficient accuracy for most purposes. Also it has been useful in demonstrating 
the nature of the distributions to students. 


The ordinates of the chart are in units proportional to the normal probability 
integral. The units of the abscissa of the x? chart are proportional to the square 
roots of the values of x? while the abscissa of the ¢ chart is linear. The values used 
in plotting the curves were obtained from several sources*:** or in a few cases were 


computed. 


The method of reading the chart is obvious. For example with a x? value of 17 
based on 7 degrees of freedom, the vertical line corresponding to a x? value of 17 
is followed upward until it intersects the curve corresponding to N =7. Directly 
to the left of this point the probability, .017, is read off. With the chart inverted 
probabilities for the ¢ distribution are read in exactly the same way. The proba- 
bility given is the probability of a numerically greater deviation, that is, it is the 
sum of the two tails of the distribution. Since the normal distribution may be 
considered as the limiting case of the ¢ distribution when the number of degrees 
of freedom increases indefinitely, the ¢t chart may be used for obtaining probabili- 
ties corresponding to values of the relative deviate by considering the number of 
degrees of freedom to be infinity and using the deviation (in terms of standard 
deviation) as the value of ¢. 


1V. A. Nekrasoff, “Nomography in applications of statistics,” Metron, Vol. 8 (1930): 95-99. 

2M. G. Kendall, The Advanced Theory of Statistics, Vol. 1. 

?C. I. Bliss, “A chart of the chi-square distribution,” this Journal, Vol. 39 (1944): 246-248. 
cosa) A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research, 

5’ Karl Pearson, Tables for Statisticians and Biometricians, (1924), Vol. 1. 

§ “Student,” “New tables for testing significance of observations,” Metron, Vol. 5 (1925): 105-120. 

* Copies of the chart multilithed on 8} by 11 inch paper may be obtained free from the Department 
of Zoology, Dartmouth College. 
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BOOK REVIEWS 


Oscar Krisen Buros 
Review Editor 


A.S.T.M. Manual on Presentation of Data: Including Supplement A, Presenting 
+Limits of Uncertainty of an Observed Average; Supplement B, “Control 
Chart” Method of Analysis and Presentation of Data; and Tables of Squares and 
Square Roots. Sponsored by Committee E-1 on Methods of Testing. Philadel- 
phia: American Society for Testing Materials, 1945. Pp. ix, 73. $0.85. Paper. 


REviEw BY EvGeNE L. Grant 
Professor of Economics of Engineering, Stanford University 

HIs is the latest printing of the well known manual used by engineers for 

the past twelve years. The main body of the manual was written in 1933. 
The two supplements first appeared in 1935. The committee which originally 
prepared the manual was made an A.S.T.M. standing committee for its 
periodic review and revision. The manual has been kept up to date by 
minor changes in each printing since 1935. 

This A.S.T.M. committee is composed of engineers who are also statistical 
experts. It is particularly because engineers meeting this description are so 
rare that this excellent manual has proved so helpful to the engineering pro- 
fession. The matter was well expressed by Dr. C. G. Darwin, Director of the 
British National Physical Laboratory, in an address in April 1942 before a 
joint meeting of the Institutions of Civil, Mechanical, and Electrical En- 
gineers. “It seemed to me,” he said, “that there was a defect in the habit of 
thought of many in the engineering profession, and that some sort of cam- 
paign was needed to inculcate in people’s minds the idea that every number 
has a fringe, that it is not to be regarded as exact but as so much plus or 
minus a bit, and that the size of this bit is one of its really important quali- 
ties.” 

It is partly because many engineers have never been sufficiently aware of 
the inevitableness of variability that they have not informed themselves 
about the best methods of dealing with variability. This lack of statistical 
sophistication can lead to many bad consequences, such as uncritical specifi- 
cation of manufacturing tolerances, the specification of uneconomical and 
ineffective acceptance procedures, wasted data in research and development 
work, and inefficient presentation and analysis of experimental results. 

The main body of this manual, which explains the presentation and in- 
terpretation of frequency distributions, gives direct help to engineers in 
presenting experimental results. The clear statements of statistical funda- 
mentals help indirectly with other matters. The committee interpolates a 
number of comments which are of general interest to all statisticians. One 
such pertinent remark is: “We have yet to find an observed frequency dis- 
tribution of over 100 observations of a quality characteristic and purporting 
to represent essentially uniform conditions, that has less than 96 per cent of 
its values within the range X +3e” (p. 27). 
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Supplement A discourages engineers from their traditional use of probable 
error as a measure of the reliability of an observed average. In its place are 
substituted “limits within which X’ may be expected to lie (9 times in 10, 
or 99 times in 100), in problems involving a single sample of n observations” 
(p. 41). This supplement contains working rules for the number of places to 
be retained in computation and presentation which should be of general value 
to all statisticians. 

Supplement B is of special current interest because of the great wartime 
expansion of the use of statistical quality control techniques in industry. 
This supplement, which gives a concise description of the Shewhart control 
chart, might well be required reading for all practicing statisticians and 
teachers of statistics. Just as the engineering profession in general might 
profit by better acquaintance with statistical method, so also statisticians 
in general might profit by better acquaintance with certain major contribu- 
tions to statistical theory which have been made by engineers. The most 
important of these contributions is the Shewhart control chart. 

Although Supplement B does a good job of explaining how to make con- 
trol charts, it is too brief to give an adequate explanation of when or why 
to make them. For an exposition of the great value of the control chart as 
an instrument for the diagnosis of manufacturing problems, or an explana- 
tion of its general importance to statisticians as a test for the constancy of 
a system of chance causes, the reader must look elsewhere. 


Engineering and Scientific Graphs for Publications. American Standard ASA 
Z15.3—1943. New York: American Society of Mechanical Engineers, 1943. Pp. 
27. $0.75. Paper. 
Review BY DupLEy J. CowpEN 
Professor of Economic Statistics, University of North Carolina 

HIs concise booklet covers the planning and construction of engineering 
| proteng and their preparation for the printer. The types of graphs cov- 
ered are line diagrams other than time series charts, and scatter diagrams. 
If one is interested in time series charts, he should consult also American 
Standard ASA Z15.2-1938. Brief mention only is made of logarithmic and 
probability ruling. In general there is no attempt to suggest different types 
of charts that can be used for specific purposes. 

In Part 1, dealing with design and layout, the keynote is that the chart 
should convey the features that the author considers important, with mini- 
mum effort on the part of the reader. Detailed instructions include scale 
proportions, spacing of coordinate lines, scale and curve designations. Well- 
executed charts illustrate most of the principles. 

Part 2, on construction of the chart, covers the field as well as could be 
done in such a short space. The size of the original drawing is discussed and 
the width of lines, size of plotted points, and style, size, and weight of letter- 
ing. Specific directions are given concerning size of template and pen number 
for making letters, though this is not a part of the standard. 
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Although clear instructions are given on all points covered, these instruc- 
tions are flexible enough to allow the exercise of a reasonable amount of 
judgment. It would be difficult to improve this manual without expanding 
its length or scope. 





Guide for Quality Control and Control Chart Method of Analyzing Data. Ameri- 
can War Standards, Z1.1—1941 and Z1.2—1941. New York: American Standards 
Association, 1941. Pp. 15, 47-66. $0.75. Paper. 
Control Chart Method of Controlling Quality During Production. American War 
Standards, Z1.3—1942. New York: American Standards Association, 1942. Pp. 
41. $0.75. Paper. 
REVIEW BY FREDERICK MOSTELLER 
Research Mathematician, Statistical Research Group, Princeton University 
LTHouUGH W. A. Shewhart’s Economic Control of Quality of Manufactured 
Product was published in 1931, industry did not begin to make wide- 

spread use of control chart methods until the war. In the national emergency 
there was need for the development of standards in the field of quality con- 
trol. To satisfy this need the American War Standards Z1.1, Z1.2, and Z1.3 
were written by a committee of the American Standards Association upon 
request by the War Department. 

The three Standards form an excellent text on the control chart pro- 
cedures developed by Shewhart. These Standards are printed in two book- 
lets, the first of which contains Z1.1 entitled Guide for Quality Control, 
Z1.2 entitled Control Chart Method of Analyzing Data, and an appendix 
entitled “Control Chart” Method of Analysis and Presentation of Data, which 
is not a part of the American War Standard, but is a reproduction of Supple- 
ment B of the A.S.T.M. Manual on Presentation of Data. The second booklet 
contains Z1.3 entitled Control Chart Method of Controlling Quality During 
Production, and three appendices: Appendix 1 gives factors for computing 
3-sigma control limits from samples of size 2 to 15 by means of ranges, and 
from samples of 2 to 25 by means of the root mean square (this table also 
appears in the appendix of the first booklet); Appendix 2 contains factors for 
computing control limits at levels other than 3-sigma; Appendix 3 consists 
of references. 

Z1.1 is a five-page exposition of the general quality control problem in 
which the concept of a state of statistical control is established and the con- 
trol chart is described and illustrated for averages of measurements taken 
from small samples. In Z1.2, the computational steps are shown for the con- 
struction of control charts for measurements. An example worked by use of 
both ranges and standard deviations illustrates the similarity of the results 
obtained by the two methods. In addition there is an example of a control 
chart for per cent defectives. 

The second booklet, Z1.3, is a well-organized text on control chart pro- 
cedure. This pamphlet considers all the standard kinds of control charts: 
(a) control charts for measurements using either the range or standard 
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deviation, called charts for X and R, or X and ¢; (b) control charts for frac- 
tion or per cent and number of defective items, called charts for p or pn; 
(c) control chart for number of defects in samples, called chart for c. 

The quality control procedure is discussed starting with the questions of 
what data to gather, how to collect and group it, and how to treat the data 
once gathered to get the appropriate control chart. The kinds of information 
offered by the control chart and the ways in which it provides an objective 
basis for action on a manufacturing process are discussed. Finally Z1.3 
contains illustrative examples of all the types of control charts mentioned 
above. 

As far as information about quality control procedure is concerned, it 
seems fair to say that there is considerable overlapping between the two 
booklets, and that Z1.3, the second booklet, is in general, better organized 
for use in practice. 

Tevhniques are not given in these booklets for treating special problems 
such as how to obtain control limits when tool wear is present and the speci- 
fication limits are wide, or how to treat a process in which lot-to-lot variation 
is always large compared to within lot variation. 

Although the American War Standards will be of interest primarily to 
people concerned with manufacturing, experimental workers in the sciences 
may also find them profitable reading. 


Statistical Adjustment of Data. W. Edwards Deming (Adviser in Sampling, 
Bureau of the Budget, Washington, D. C.) New York: John Wiley and Sons, 
Inc., 1943. Pp. x, 261. $3.50 (London: Chapman and Hall, Ltd., 1944. 21s.) 


REVIEW BY JoHN H. SmitTH 
Acting Chief Statistician, Bureau of Labor Statistics, Washington, D. C. 

N THIs book a great many types of problems are considered under the 
pA principle of least squares—that of minimizing an appropriately 
weighted sum of squares of residuals from adjusted values. The generality 
of the approach is indicated by the author’s classification of problems ac- 
cording to the nature of the restrictions imposed upon the adjusted values. 
Considerable emphasis is placed upon (a) problems containing conditions 
without parameters, (b) estimating cell frequencies when marginal totals 
are known, and (c) curve fitting when more than one variable is subject 
to error. Methods based on the characteristic equation of the moment matrix 
in units of the respective standard errors of measurement are not mentioned 
in connection with (c). 

The book has a number of distinctive features. The example in which ran- 
dom errors of known variances are added to both coordinates of selected 
points on a parabola (pp. 218-30) is praiseworthy because it permits the 
reader to compare each aspect of any given practical problem with the cor- 
responding aspect of the example in order to assure himself that the two 
problems are comparable. The use of Lagrange multipliers to simplify and 
systematize the solutions for many types of problems popularly supposed to 
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be insoluble illustrates the technical devices which the author has assembled 
and organized for adaptation to many types of problems. The saving of time 
and effort made possible under certain conditions by the use of approximate 
values of parameters is well presented. The section on conditions without 
parameters contains important types of problems which are rarely presented 
in statistics books, and the material on estimation of cell frequencies is in- 
deed timely as the author suggests in the preface. The technical discussions 
are liberally supplemented with bits of good practical advice. The author 
has done a scholarly piece of work assembling materials bearing on the gen- 
eral problem of adjustment of data and adapting them for his purposes. 
Since original articles in this field are not readily accessible, the many well- 
chosen references and historical notes are especially helpful. 

The author’s contributions to the specialized and relatively difficult 
problems which he emphasizes hardly justify his overoptimism with respect 
to the relative effectiveness of the proposed methods. Since a considerable 
amount of effort is still required to solve some of the problems, it is unfor- 
tunate that the author tends to conceal relative advantages by emphasizing 
conditions under which a given approximate method yields results which are 
good enough for purposes of action rather than those under which the meth- 
od has comparative advantages. For example, consider the problem of mini- 
mizing the sum of squares of residuals of the form y — az® based on the six 
observations (1, 9), (1, 11), (2, 19), (2, 21), (4, 39), and (4, 41), where the 
three array means (10, 20, 40) are the required adjusted values. These are 
approximated closely by either (10.10, 20.11, 40.03) or (9.95, 19.96, 40.01)— 
values obtained by weighting log y by y? and by the simpler unweighted 
method, respectively. This example illustrates the conditions stressed by 
the author as favorable to weighting log y by y’; that is, the curve fits well 
(it fits the array means perfectly) and the residuals are small—so small that 
rz, = 0.997. Nevertheless, equal weighting yields results which are superior 
to those obtained by the more complex proposed method as judged by the 
criterion on which weighting by y? is based because the weighting bias intro- 
duced by weighting by y? is (to two decimal places) three times as great as 
the similar bias in the opposite direction introduced by equal weights. Of 
course, there are conditions under which the weighting bias is trivial in com- 
parison with differences in results due to other factors. In such cases, the 
relative advantage of weighting by y* over the unweighted method may be 
great (pp. 199-201). These conditions should be discussed in a manner de- 
signed to help the reader to recognize conditions relatively favorable to al- 
ternative approximate methods. For a practical research worker who chooses 
among alternative methods on the basis of simplicity and effectiveness, the 
class of problems in which log y should be weighted by y? is very limited. 
Even when the criterion is appropriate the advantage in accuracy over that 
of the unweighted method must be significant in order to justify the extra 
effort introduced by the weighting process while, at the same time, the accu- 
racy requirements must not be so exacting as to warrant the moderate 
amount of additional effort necessary for estimating exact weights (p. 202). 
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When the marginal totals of a universe are known, cell frequencies can be 
estimated from inflated sample frequencies by simply distributing the row 
adjustments among the cells in proportion to column totals and column 
adjustments in proportion to row totals. In this way, the proportion for any 
cell in a two-by-two table with equal marginal totals is estimated to be half 
the sample proportion in both cells on its diagonal. Hence, for this simple 
special case, the sampling variance of the estimated proportion is reduced 
from pg/n to 2p(q —p)/(4n), a reduction of more than fifty per cent. When all 
adjustments are relatively small as in the examples in the book, results ob- 
tained by this simple method are satisfactory. 

The author’s treatment of errors in cell frequencies is not convincing with 
respect to the theoretical advantages of the proposed methods even under 
ideal conditions. For example, he begins with the assumption that the sam- 
pling variance of the frequency of a cell containing more than 10 observations 
is equal to the cell frequency and shows (by methods developed for inde- 
pendent measurements) that the variance of the sum of the cell frequencies 
in a two-celled table is equal to n (instead of zero, the exact value) and that, 
by the process of adjustment, the variance of a cell proportion “is reduced 
from p/n to pq/n” (p. 69). Discussions which suggest that errors in cell 
frequencies are uncorrelated appear also in later sections. This is important 
because the sum of squares of standardized errors in a set of correlated cell 
frequencies is not, in general, distributed as chi-square. In fact, Pearson’s 
original article is mainly devoted to a proof by which the general expression 
for the sum of squares of any complete set of standardized uncorrelated func- 
tions of errors in cell frequencies is shown to be exactly equal to the familiar 
expression for chi-square. The expected values in the denominators are not 
intended to be equal, or even proportional, to the sampling variances of the 
corresponding cell frequencies as suggested by author’s discussion on page 
102 in which he defends the use of observed frequencies in the denominators 
of the quanity (similar to chi-square) which is minimized by the proposed 
methods. Although it would be desirable to revise this discussion in con- 
formity with the theory of random sampling, it is even more important to 
investigate the relative advantages of alternative methods with respect to 
certain types of bias especially when the bias is known to be most serious in 
certain parts of the table. 


Sampling Inspection Tables: Single and Double Sampling. Harold F. Dodge and 
Harry G. Romig (Bell Telephone Laboratories, Inc., New York, N. Y.). New 
York: John Wiley & Sons, Inc., 1944. Pp. vi, 106. $1.50. (London: Chapman & 
Hall, Ltd., 1945. 9s.) 
ReEvIEw BY KENNETH J. ARNOLD 
Senior Mathematical Statistician, Statistical Research Group, 
Columbia University 


—— for the preface and short introduction, the book is a reprinting 
of two papers by H. F. Dodge and H. G. Romig and one by (Miss) D. B. 
Keeling and L. E. Cisne which appeared in the Bell System Technical Journal 
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(8: 613-31 O’29, 20: 1-61 Ja’41, and 21: 37-50 Je’42 respectively). The tables 
of the title were a part of the second paper. The three papers have also ap- 
peared as Bell Telephone Laboratories Reprint B-431 and Bell Telephone 
System Technical Publications Monographs B-1274 and B-1345 respec- 
tively. 

The tables are for use in nondestructive acceptance inspection of lots. The 
sampling plans embodied in the tables assume that items from a production 
line are submitted to a consumer in lots, that each item is either conforming 
or defective, and that the consumer accepts some lots after inspection of a 
sample, others after complete inspection. The sampling plans seem originally 
to be intended for and are best suited for the case in which the concumer and 
the producer are parts of the same organization or where the inspection is 
carried out by the producer in the interests of the consumer. 

Two classes of objectives are considered. One requirement for the SL 
(Single Sampling—Lot Quality Protection) tables and the DL (Double 
Sampling—Lot Quality Protection) tables is that the probability of accept- 
ing a lot with fraction defective p, (lot tolerance fraction defective) be Pe 
(Consumer’s Risk, always taken to be 0.10 in the Dodge-Romig tables). 
One. requirement for the SA (Single Sampling—Average Quality Protection) 
tables and the DA (Double Sampling—Average Quality Protection) tables 
is that the average fraction defective in lots after inspection, pa (average 
outgoing quality, AOQ) be never greater than some number pz (average 
outgoing quality limit, AOQL) regardless of the quality of lots submitted. 
In all the tables, average inspection for lots of N items from a production 
line for which the expected value of fraction defective produced is p (process 
average) is to be the minimum possible under the restrictions imposed by 
the objectives and the types of sampling procedures considered. 

In the SL and SA tables, the class of sampling procedures within which the 
objectives are to be attained is limited to procedures of the following type, 
(single sampling): A random sample of n items is taken from the lot. Each 
item is classified as conforming or defective. If the number of defectives in the 
sample is c or less, the lot is accepted; if the number of defectives is greater 
than c, the remainder of the lot is inspected. Defectives found are replaced 
by conforming items. 

In the DL and DA tables, the class of sampling procedures within which 
the objectives are to be attained is broadened to admit procedures of the 
following type, (double sampling): A random sample of n, items is taken from 
a lot. If the number of defectives in the sample is ¢; or less, the lot is accepted; 
if the number of defectives is greater than cz, the remainder of the lot is 
inspected ; if the number of defectives is greater than c, but not greater than 
C2, another random sample, this time of nz items is taken from the lot. If the 
total number of defectives in both samples is cz or less, the lot is accepted; 
if the total number of defectives is greater than cz, the remainder of the lot 
is inspected. Defectives found are replaced by conforming items. 

The text of the book explains the motivation of the plans and derives the 
formulas which may be used to set up sampling inspection plans subject to 
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either class of objectives and to either limitation on type of plans considered. 
Values of n and c or mj, ne, ci, and c2 are given in the SL and DL tables for 
N from 1 to 100,000, ~ from 0 to $7:, and selected values of p, from 0.005 
to 0.10. Values of n and c or m, me, c, and cz are given in the SA and DA 
tables for N from 1 to 100,000, j from 0 to pz and selected values of pz from 
0.001 to 0.10. The values given are approximations based for the most part 
on the use of the Poisson distribution in place of the binomial and the hyper- 
geometric. The DL tables further assume a specific division of risk between 
samples and the DA tables are derived in part from the DL tables. 

Since the legitimate uses of data collected under a sampling plan depend 
on the method of and not the reasons for collection, the data can be used for 
purposes other than those implied by the stated primary objectives of the 
plan. Thus, in the book under review, while the plans proposed are initially 
restricted in purpose to acceptance inspection (i.e., to control of quality of 
accepted lots), the authors rightly describe their use in quality control (i.e., 
in the control of quality of production). 

Several other sampling inspection plans are now available which admit a 
broader class of sampling procedures while preserving a significant part of 
the objectives of the Dodge-Romig plans. The inadequacy of presently avail- 
able comparisons of these various plans leads one to hope that someone soon 
will take it upon himself to systematically list objectives and restrictions on 
type of sampling which have been considered, to give as far as possible, op- 
erating characteristics, average sample size and some discussion of the 
adequacy of each plan in attaining other than the primary objectives. 


Wages of Agricultural Labor in the United States. Louis J. Ducoff and others. 
Washington: Bureau of Agricultural Economics, 1944. Pp. 193. Mimeographed. 


Review BY Rosert J. MYERs 
Bureau of Labor Statistics 
Washington, D. C. 

HE importance of the “hired hand” as a factor in modern farm production 
Tis more fully appreciated today than ever before. The shortage of agri- 
cultural labor has prompted the importation of thousands of workers from 
neighboring countries and has stimulated urgent appeals to the nonfarm 
population of this country to assist in the planting and harvesting of vital 
crops. Farm wages have rocketed throughout the Nation and in some areas 
have risen enough to entice workers away from manufacturing and mining. 
This descriptive account of the farm laborer and his pay is consequently well- 
timed and will appeal to a variety of interests. 

About four million persons work on farms for wages during a year such as 
1943 but, due to labor turnover and the seasonality of employment, the aver- 
age number employed throughout the year is only about 2.5 millions. Many 
farm laborers hold down mining or manufacturing jobs during part of the 
year, while others are share croppers or own small farms of their own. A 
substantial proportion of all hired farm workers are employed by a small 
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number of large-scale enterprises, some of which are organized on a semi- 
industrial basis. Although use of hired laborers is most characteristic of agri- 
culture in the Far West and the Northeast, nearly half of all such workers are 
employed in the South. 

A considerable part of this enlightening study is devoted to a discussion 
of the characteristics of farm laborers, the types of farms on which they are 
employed, and various geographical and other factors associated with farm 
wage differentials. It is of interest to note that these factors tend to parallel 
the influences that shape industrial wage structure. 

The extraordinary increase of farm wage rates since the outbreak of the 
war (approximately 150 per cent from 1939 to 1944) is not emphasized in 
this study, which, for the most part, makes comparisons with the farm 
“parity” period, 1910-14. Since that period the hourly earnings of farm 
laborers have risen scarcely more than during the last few years and have 
lagged far behind those of factory workers. The average annual wage income 
of hired farm workers has also risen somewhat less since 1910-14 than that 
of industrial workers. Although the parity measure specified in the Agricul- 
tural Adjustment Act of 1938 is influenced primarily by the incomes of farm 
operators and their families, this study suggests that a similar measure ap- 
plied to hired farm laborers would reveal that parity has not been attained. 

Despite the great gains of recent years, farm wages are typically far below 
the level of industrial wages. The hourly earnings of farm day workers with- 
out board, which for several years before World War II averaged about 15 
or 16 cents, had climbed only to about 37 cents by April 1944. At the latter 
date, average hourly earnings ranged from 18 cents in South Carolina to 72 
cents in the State of Washington. Wages high enough to draw away industrial 
employees were very rare and generally represented the earnings of workers 
paid on incentive basis and engaged in the harvesting of perishable crops. 

The reasons for the disparity between farm and industrial wage rates are 
not discussed specifically in this work, although a number of contributing 
factors are mentioned, including the generally low economic return of farm- 
ing, the immobility of farm labor, and the high fertility of the farm popula- 
tion. The lack of legal protection against substandard wages and the virtual 
absence of unionization among farm workers come to mind as important 
additional factors. 

The wartime regulation of farm wage rates and a number of considerations 
for postwar wage policy are discussed in the final chapters. The inclusion of 
60 tables and 25 charts greatly enhances the value of this work for reference 
purposes. 

Wages of Agricultural Labor in the United States was completed too soon 
to make use of improved and detailed wage statistics provided for in a subse- 
quent Congressional authorization. Readers of this study may also be inter- 
ested in the first results of the new program, published in the March 13, 1945, 
issue of the Bureau of Agricultural Economics’ mimeographed bulletin, 
Farm Labor. 
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Quality Control Charts: Being Part 1 of the Revision of B.S. 600:1935, The 
Application of Statistical Methods to Industrial Standardisation and Quality 
Control. B.S. 600R:1942. B. P. Dudding and W. J. Jennett (Research Labora- 
tories of The General Electric Co., Ltd., of England). London: British Standards 


Institution, 1942. Pp. 89. 3s. 10d. Paper. 
Quality Control Chart Techniques When Manufacturing to a Specification: With 
Special Reference to Articles Machined to Dimensional Tolerances. B. P. 


Dudding and W. J. Jennett. London: General Electric Co., Ltd., of England, 
1944, Pp. iii, 74. 2s. 6d. Paper. (Arlington, Va.: Gryphon Press, 1945. $1.00.) 


REVIEW BY HAROLD A. FREEMAN 
Associate Professor of Statistics, Massachusetts Institute of Technology 


HESE two publications explain to practical readers the construction and 
7a; of quality control charts. The first is Part 1 of a revision of an 
earlier publication by Professor Egon Pearson; the second covers about the 
same ground, with heavy emphasis on the relation of engineering tolerances 
to quality control technique. 

Both jobs are well done. Quality control charts for attributes and variables 
are described in great detail; there are many arithmetical examples, complete 
glossaries and much practical advise. The usual tables covering the sampling 
variation of commonly used statistics are included. Emphasis is on the ap- 
plied side where the authors have had, obviously, considerable experience. 

The two publications cover about the same ground as do the American 
pamphlets issued by the American Society for Testing Materials and the 
American Standards Association. The British publications are somewhat 
more complete, the American a little easier to read. 

It is possible that this particular job has now been done well often enough. 
The reviewer would welcome a pamphlet on the statistical theory on which 
quality control rests for this is not quite as obvious and ironclad as these ex- 
cellent applied publications make it out to be. 


Statistical Methods and Control in Bacteriology. Churchill Eisenhart (Statisti- 
cian, Agricultural Experiment Station and Assistant Professor of Mathematics, 
University of Wisconsin) and Perry W. Wilson (Associate Professor of Agricul- 
tural Bacteriology). Bacteriological Reviews 7(2): 57-137, June 1943. Out of print. 
Paper. (Single copies may be obtained gratis while the supply lasts from the 
Department of Bacteriology, University of Wisconsin.) 

REVIEW BY JoHN W. FERTIG 

Professor of Biostatistics, School of Public Health of the Faculty of Medicine, 
Columbia University 


HIS paper is not intended to be a course in statistics. Its purpose is rather 

to enable the bacteriologist to recognize problems which might benefit 
through statistical interpretation and to phrase his problems more clearly 
when seeking the aid of a statistician. The mathematical formulation has 
been kept to a minimum from the point of view of the statistician but prob- 
ably not from that of the bacteriologist. In a general way most of the tech- 
niques presented in R. A. Fisher’s textbook are discussed, together with some 
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of Neyman’s work on the testing of hypotheses and confidence intervals, and 
some of Shewhart’s work on statistical control. 

The first half of the paper is concerned with the customary three distribu- 
tions, binomial, Poisson, normal. The discussion of the binomial distribution 
is very short but concise, while the excellent discussion of the Poisson is quite 
detailed, presenting its various applications to the several methods of count- 
ing bacteria. In a paper of this kind it seems that an excessive amount of 
space (16 pages) is devoted to the method of dilution sampling. It is doubtful 
whether the usual bacteriologist can follow all the mathematical intricacies. 
Nevertheless the presentation of this subject is very excellent. On page 86, 
line 31, the formula for the standard deviation of the estimated number of 
organisms per cc. seems to be misstated. 

The second half of the paper considers the usual tests of significance, nor- 
mal curve test, t-test, analysis of variance, regression and correlation, and 
various examples of the chi-square test. The frequently occurring problem 
of comparing two means when the variances are not equal is not discussed. 
The section on the analysis of variance commits an error common to many 
such abridged treatises in concentrating too much on short cuts of computa- 
tion effected by algebraical rearrangements of the formulae, at the expense 
of a discussion of the basic processes. It is doubtful whether any useful pur- 
pose is served in a paper of this kind by presenting examples of the analysis 
of variance in which high order interactions occur, at least when the meaning 
of the term “interaction” is not adequately explained for the nonstatistician. 
While the authors stress the importance of not regarding the conclusion of 
nonsignificance as operational equivalence, they seem to have disregarded 
this advice on occasion, as on page 107, where it is stated, on the basis of F 
being somewhat less than the 5% point, that “it appears that the varieties 
responded identically in the two experiments.” 

The formula for testing the significance of the difference of two intercept 
values of page 116 seems to be given incorrectly. In the graduation of a fre- 
quency distribution by the normal curve, on page 122, where both the mean 
and standard deviation must be estimated simultaneously, it seems more 
correct to use N rather than N —1 as the divisor for the estimate of the stan- 
dard deviation. This reviewer cannot appreciate the need for using the sym- 
bol D? instead of X* to represent the indices of dispersion among samples 
drawn from a Poisson or binomial distribution. Such data constitute con- 
tingency tables in which one scale of classification is categorical, and indeed 
the symbol X? is used on page 125 for such tables. 

The paper is concluded with a short appendix of technical terms and an 
excellent bibliography of 125 references to articles in statistical methods and 
in bacteriology. 

In the opinion of this reviewer too many subjects are covered in too little 
space, with the result that many subjects are covered very sketchily. For 
example, the problems of bioassay are discussed very briefly, as is the correla- 
tion jcoefficient. Probably the latter could be eliminated entirely as having 
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little importance in bacteriology. While the paper is very well written and 
informative to the statistician, it is not clear that the same can be claimed 
for the bacteriologist. 


What The Figures Mean. Stephan Gilman (Vice President and Educational Direc- 
tor, International Accountants Society, Inc.). New York: Ronald Press Co., 
1944, Pp. ix, 127. $2.50. 
REvIEw BY Harry PELLE HARTKEMEIER 
Professor of Business Statistics, University of Missouri 


IGURE work at one time was well confined within the four walls of a book- 

keeping or accounting department. But those four walls have bulged and 
finally given way until under war conditions and governmental controls the 
compiling, summarizing, and interpreting of figures has become a major busi- 
ness activity. This is true not only of the large corporation but of the little 
business the proprietor of which burns midnight oil over figures having to do 
with price ceilings, inventories, ration points, sales taxes, and income taxes. 
In fact, many a business man must feel himself a derelict on a veritable ocean 
of business figures. Not only has figure work become a major activity of busi- 
ness, but its balloon-like expansion has found too many individuals un- 
prepared to understand the results. (p. 3) 

It is an informal sort of book, deliberately kept simple so that it may 
better meet the needs of those for whom more eshaden but more elaborate 
statistical treatment has little interest. (p. iii) 


The above quotations explain the purpose and nature of the book. There 
is some doubt about the person who should be selected as the one best quali- 
fied to do the work which will answer the question, “What do these figures 
mean?” Although the author admits that the science of statistics has de- 
veloped methods for “interpreting numerical data,” he eliminates the ac- 
countant, the statistician, and the statistical mathematician. The accountant 
is eliminated because of the “rules, conventions, and doctrines of account- 
ing.... Just as one doubtful egg may ruin the omelette, so does the com- 
bination of such accounting estimates [depreciation, apportionment, in- 
ventory valuation] with non-accounting data affect the results.” The statis- 
tician is eliminated because “Unfortunately, most of the methods of statistics 
have been developed in fields quite removed from business. Few statisticians 
are acquainted with the problems of the individual enterprise.” The statis- 
tical mathematician is eliminated because “Precision tools are not required 
in building a cowshed. Similarly, the precision methods of mathematical 
statistics, even if they were fully comprehensible to the business man, are 
inappropriate in relation to the solution of business problems.” 

This situation would seem to force the business man to analyze his own 
data after reading this book. The reviewer is inclined to think that the 
business man is not apt to be very enthusiastic about this prospect after he 
has already burned “midnight oil over figures having to do with price ceilings, 
inventories, ration points, sales taxes, etc.” The reviewer would recommend 
that the business man employ a business statistician, who is different from 
the man the author seems to have in mind when he describes a statistician. 
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While it is true that early teachers of statistics spent too much time analyz- 
ing data on the length of leaves, the dimensions of beans, grades of students, 
etc., today we are training business statisticians who can give very practical 
help to the business man. If the business man cannot or will not employ a 
good business statistician, then the reviewer is willing to grant that the 
methods given in this book would be the next best choice. This book is the 
best one of its kind the reviewer has encountered. It is not very suitable for 
a college textbook in business statistics, for the author uses many technical 
terms in the field loosely in an every-day conversational sense which is dif- 
ferent from their technical meaning. For example, he speaks of “longer 
term trend” in the data even though the time period used and shown does 
not exceed four years, and he gives the name “trend percentages” to ratios 
obtained by dividing data for the current year by data for the preceding 
year. In only one case does the author use data for as many as four years. 
“Seldom is it practical to undertake studies covering a period of more than 
two years or 24 months” expresses the opinion of the author on time series 
analysis. The author never uses the term business cycle, and the reviewer 
suspects that when the author refers to the trend he means primarily the 
business cycle. This practice is fairly general among business men. 

In those universities and colleges which still cling to the practice of giving 
a course in business management or industrial management to students who 
have not previously completed a course in business statistics, this book would 
serve very well as one of the textbooks for the course in business manage- 
ment. 


Business Leadership in the Large Corporation. Robert Aaron Gordon. Washing- 
ton: Brookings Institution, 1945. Pp. xiv, 369. $3.00. 


Review By C. Cansy BALDERSTON 
Dean of the Wharton School of Commerce and Finance, 
University of Pennsylvania 

r. Gordon has completed an important study in the important field of 

business leadership. Whether or not the reader agrees with the author’s 
assumptions as to the nature of leadership, he will find Dr. Gordon’s chapters 
readable and stimulating because of the abundance of his examples. In fact, 
readers may find themselves tempted at times to read the footnotes before 
the text. So much of the book’s contribution is contained in these footnotes 
and the references to well-known individuals and companies are so intriguing 
to students of management that the more delectable bits seem to have 
fallen from the text into the footnotes. 

The contents are divided into three parts: the introduction; business 
leadership by management and outside interest groups; and incentives and 
the professionalization of business leadership. In the introduction the author 
describes the large corporations because these are the setting for his analysis 
of the impact of various forces upon leadership. This portion adds little that 
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is new. Next, he describes business leadership in practice and then turns to 
his most significant addition to our store of management knowledge in 
Chapter II. Here he treats the decision-making of the chief executive, the 
executive group, the board of directors, the stockholders, outside financial 
groups and the government. His Part III is less interesting except for his 
emphasis upon the distinction between the owner-manager, who risks his 
capital, and the salaried executive who is motivated very differently. Much 
of this phase of the problem has been treated before by President John C, 
Baker in his studies of executive compensation. Further, some readers will 
criticize Dr. Gordon’s use of the word “professionalization” in connection 
with business leadership. The author features this word as descriptive of 
the type of manager to be found in large companies, but does not define 
the term. On page 258 he refers to the influence of professional groups, such 
as “lawyers, engineers, and accountants, for the most part.” In the chapter 
dealing with the “professionalization of leadership,” however, he refers to 
professional executives as salaried managers. One somehow reacts adversely 
to such use of “professional.” 

Some readers will feel that portions of the book are redundant. But even 
though the organization of the contents might be improved, the author is to 
be excused because the subject matter is difficult to present in an orderly, 
unified manner. Despite these minor defects of presentation, this book is a 
significant addition to the literature of management. 


Experimental Sociology: A Study in Method. Ernest Greenwood (Instructor in 
Sociology, University of Cincinnati). New York: King’s Crown Press, 1945. Pp. 
xv, 163. $2.25. Paper. 
REvIEw BY Louris GUTTMAN 
Associate Professor of Sociology, Cornell University 


IscussIoNs are still being held as to whether or not sociology is, or can 

be, “scientific.” This kind of discussion has often led in particular to a 
debate as to whether or not some techniques of research in sociology should 
with propriety be called “experimental.” Sociologists and non-sociologists 
alike who take pleasure in such discussions about words will find the first 
four chapters of Greenwood’s monograph worth reading. 

Those who are interested in finding out about some of the objective re- 
search that is being carried on by sociologists will find a good summary in 
Chapter V. Problems of control are discussed in the Chapters VI and VII. 
It is in the final chapters, VIII and IX, that we find the motivation for writ- 
ing the monograph—a description and analysis of “ex post facto experi- 
ments.” 

A vast store of data of various kinds lies in the files of various institutions: 
educational, governmental, social work, and the like. These data have not, 
for the most part, been gathered with any particular research design in mind; 
yet here is a mine of material which may contain many research diamonds. 
How can useful inferences be drawn from such data? The problem is, of 
course, not peculiar to sociology, but it is Professor F. Stuart Chapin and 
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his students who seem to have given most attention to the possibility of sal- 
vaging such information. Chapin calls the process an “ex post facto experi- 
ment,” in contrast to the “projected experiment” where the design is deter- 
mined in advance. 

Greenwood focuses most of his discussion on how closely does the ex post 
facto approximate the projected experiment. Actually, this is but another 
version of how closely does a correlational analysis approximate a “causal” 
analysis, and Greenwood seems to come out with the same inconclusive 
answer so many of our textbooks give—that it requires trained judgment to 
decide. 

Ex post facto experiments have additional problems over ordinary cor- 
relation analyses. Greenwood’s purpose is to avoid technicalities as much as 
possible, but at least one crucial point ought to be considered. What is the 
sampling process by which the original data are obtained? In many cases it 
may be difficult to specify the universe of samples, or that the sample was 
random—which may invalidate use of standard sampling theory for tests of 
significance and the like. Greenwood recognizes this with respect to possible 
bias in incomplete data, but the problem applies to complete sample data as 
well. 

If it may be granted that ordinary sampling theory applies, then there is 
unnecessary loss of cases by matching according to Greenwood’s description 
when controls are quantitative. Analysis of covariance could obviate the 
elimination of cases. For qualitative controls, there seems no way out yet ex- 
cept elimination of unmatched individuals. 

An important point not stressed by the author in discussing the value of 
“causal” analyses is that many useful predictions have been, and will con- 
tinue to be made without need for any causal interpretation. From the ex- 
ample of Christiansen’s ex post facto experiment, granted ordinary sampling 
theory applies, it is safer to bet on a high school graduate to become economi- 
cally adjusted than a nongraduate, regardless of why it is true. This can be 
useful information to employers, credit agencies, etc. 

Christiansen’s purpose, of course, was to try to answer a different question: 
Consider two persons alike in all respects. Send one te high school and not 
the other. Will there be no difference in their later economic adjustment? 
While trained judgment might agree that she has answered this particular 
question substantially, an objective theory of inference is still lacking for 
solving such problems when the only data are from an ex post facto experi- 
ment. 


Production, Jobs and Taxes: Postwar Revision of the Federal Tax System to’ 
Help Achieve Higher Production and More Jobs. Harold M. Groves. New York: 
McGraw-Hill Book Co., 1944. Pp. xv, 116. $1.25. 
REVIEW BY HERBERT STEIN 
Economist, Office of War Mobilization and Reconversion, 
Washington, D.C. 


- the past two years there has been a flood of proposals for post-war 
revision of the Federal tax system in the interest of high and stable out- 
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put. Production, Jobs and Tares—a research study of the Committee for 
Economic Development—presents one of the first and most carefully 
reasoned of these programs. Mr. Groves’ system would make the personal 
income tax a much more important source of Federal revenue and would 
eliminate or reduce discrimination among different kinds or uses of income. 
The major change suggested is the integration of the corporate profits tax 
with the personal income tax, corporation tax payments to be considered as 
advance payments on the income of stockholders. Postwar excise and sales 
taxes would be “discouraged.” Averaging of annual incomes over 5 or 10 
years, treatment of capital gains and losses like other income, flexibility in 
the computation of depreciation, elimination of tax exempt bonds, are all 
parts of a program for “reasonable classification and like treatment of those 
in like circumstances.” 

This proposal resembles closely the reforms once urged on the somewhat 
“old-fashioned” ground of equity. And, despite the implied promise of the 
title, production and jobs are not the main bases on which this tax plan is 
advanced. When all the imponderable propensities and elasticities have been 
pondered, when all the chips are down, in the difficult cases Mr. Groves looks 
primarily to “equity” for his conclusive argument. Moreover, many alterna- 
tive proposals which would deserve consideration as stimuli to “full employ- 
ment” are here dismissed with a brief reference to “equity” or to the proper 
functions of a government. This does not diminish, but rather enhances, 
the attractiveness of Mr. Groves’ position. “Full employment” is, of course, 
a basic goal of national policy. But it is gratifying that only the title and not 
the content of this book defers to the current indiscriminate preoccupation 
with jobs as the prime objective of every policy from school lunches to mili- 
tary training. 

When we have made the logical step to chief reliance upon the personal 
income tax as a device for like treatment of persons in like circumstances, 
the question of the relative treatment of persons in unlike circeumstances— 
the question of progression—still remains. At this point formal “equity” 
provides little guidance and discussion becomes necessarily rather incon- 
clusive. Mr. Groves proposes to retain a broad base for the income tax, with 
approximately wartime exemptions, for the sake of adequacy of revenue and 
“discipline.” He would lower surtax rates in the uppermost brackets to 
reduce the present deterrent to risk-taking. These revisions, together with 
other suggested changes, would produce a total tax system much more 
progressive than the prewar structure. 

But there is certainly a wide range of more or less progressive tax systems 
all of which would meet Mr. Groves’ objectives. Stagnationists may feel 
that Mr. Groves has left the range of choice unnecessarily wide by failing to 
give adequate weight to the consumption-stimulating effects of more 
progression. Those to whom increasing the nation’s real productive capacity 
is still a basic objective will miss the restraints upon progression once urged 
for the sake of “progress”—i.e., for the sake of improving the general wealth 
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and welfare through capital growth, which requires saving. Mr. Groves seems 
to share in a mild form the current view of investment as a convenient means 
of tidying up the economic system by sweeping unwanted savings under the 
carpet. 

Except in brief reference to the post-war transition, Production, Jobs and 
Taxes does not consider taxation as an aspect of budgetary policy. This 
subject will presumably be covered in the CED’s study of business fluctua- 
tions. 


The Probability Approach in Econometrics. Trygve Haavelmo (Research Associ- 
ate, Cowles Commission for Research in Economics). Supplement to Econo- 
metrica, Vol. 12, July 1944. Chicago: Econometric Society, University of Chicago, 
1944. Pp. viii, 118. $1.75. Paper. 
Review By R. L. ANDERSON 
Research Mathematician, Statistical Research Group, Princeton University. 
On Leave: Assistant Professor of Experimental Statistics and Agricultural 
Economics, North Carolina State College 

n this paper, the author indicates some of the methods which can be used 

to connect economic theory and existing data. In the past, few attempts 
have been made to do this. Haavelmo emphasizes the essential differences 
between the controlled experiments of the physical sciences and the uncon- 
trolled ones of economics, but he rejects the pessimistic view that tests 
of economic formulations can not be made because of this lack of control of 
the experiments. Instead it is the duty of the econometrician to develop 
theories which can be tested with the data on hand. This paper is a very 
important contribution to economie statistics, since it points out how the 
various probability schemes which have been so useful to the biological statis- 
tician can be used with few alterations by economists. Coupled with the re- 
sults of Mann and Wald* and those now being prepared by the Cowles 
Commission for Economic Research at the University of Chicago,** this 
paper can be used as a basis for the development of stochastic economic 
theory. 

The author divides the problems of economic statistics into these cate- 
gories: (a) Formulation of constant relationships, (6) Design of experiments, 
(c) Tests of hypotheses, (d) Estimation of parameters and (e) Prediction. 

Economic theory should be as simple as possible, while at the same time 
it should be applicable to a variety of situations. The latter condition is 
spoken of as autonomy by the author. The economist must reconcile sim- 
plicity with autonomy in the formulation of his constant relationships. 
Haavelmo might have added that there is need for a good sampling tech- 
nique to test the constancy and repeatability of results. 

The design of the experiment and the theory to be tested are interrelated. 


* “On the Statistical Treatment of Linear Stochastic Difference Equations.” Econometrica, 2: 


173-220 Jl-O '43. 
** These results will be published in the near future. 








394 AMERICAN STATISTICAL ASSOCIATION 


Since most economic data are fixed, the author might have added that the 
design is often restricted to a good sampling program. It is fruitless to devise 
a test of a given theory unless a suitable design can be devised utilizing exist- 
ing data. Haavelmo stressed the fact that existing data can not be used to 
test all types of theories. 

Much of the paper is devoted to the problem of formulating a general 
stochastic approach which could be used to fit economic theory into the 
framework of the Neyman-Pearson theory of testing hypotheses. The author 
shows how to set up random elements in economic formulations; these have 
been called “disturbances” by some economists. Economists are usually 
faced with the fact that the disturbances are not independent of the sys- 
tematic variables. For example, the fact that one miscalculates his savings 
today affects his investments tomorrow; the miscalculation may be random 
in some sense, but it does affect the plans for the future. Haavelmo’s empha- 
sis on the interdependence of the various parts of the economic process is 
one of the most important contributions he has made to realistic economic 
statistics. 

The treatment of the Neyman-Pearson theory is quite general and is well 
illustrated by a problem in trend fitting. The discussion of the type I and 
type II errors is excellent as is the example of the use of the power function 
in connection with the trend fitting problem. The author emphasizes that a 
general framework of admissible hypotheses should be set up before an ex- 
periment is made. The data of a given experiment should not be used in 
setting up these admissible hypotheses, but these data can be used to decide 
which theories from the admissible ones should be tested. However, I do not 
believe Haavelmo would exclude the use of results from a given experiment 
to reform the set of admissible hypotheses to be used in future experiments. 

In addition to pointing out the need for considering all equations simul- 
taneously when estimating the parameters of a given process, the author 
develops a method of testing for the uniqueness of these estimates. He has 
extended the Gramian criterion for linear independence to solve this problem. 
It might be mentioned that the Cowles Commission group has called this 
problem one of “identifiability” of results and has developed a different 
approach to the solution of the problem. To the reviewer, the “identifi- 
ability” concept seems somewhat clearer, since it tends to connect the prob- 
lem to actual economic processes. For example, when we use historical price 
data to estimate the parameters of a demand equation, what are the requisite 
features of this equation which guarantee that the resultant estimates actu- 
ally identify it as a demand and not a supply or some other equation? 

Finally, Haavelmo offers suggestions on how existing statistical theory can 
be used to make valid predictions and on how to compute confidence limits 
for these predictions. Although this problem is much more complicated for 
economic than for biological prognostication because of the interdependence 
of the sequences used in the process under consideration, a solution is shown 
to be possible. 
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Management of Inspection and Quality Control. J. M. Juran (Formerly Chief of 
Inspection Control Division, Western Electric Co., Inc.) New York: Harper & 
Bros., 1945. Pp. xv, 233. $3.00. 
Reviewep By A. I. PETERSON 
Quality Control Manager, Radio Corporation of America 
Harrison, New Jersey 

HIs book is a significant and well-written contribution to the current 

literature of industrial quality control. As often stressed, manufacturing 
faces ever greater precision requirements in complex products and processes. 
Further, screening inspection, scrap, and rework have become major factors 
in manufacturing cost. The parallel development of the science of control in 
technical and operating phases of manufacturing is of paramount impor- 
tance. 

Widely experienced in management, the author presents sound principles 
and a well-rounded evaluation of problems of quality specification, inspec- 
tion philosophy, lot acceptance sampling, control sampling, inspection 
organization and lines of responsibility, and last but foremost of the overall 
management problem. Process control is presented at a level of practical 
investigation and direct adjustment, when statistical checks on generally 
developed processes and products depart from quality standards. Obviously, 
many industrial processes deal with physical characteristics that skilled 
operators can relate intuitively to necessary corrections, if promptly aware of 
conditions. The author rightly warns of the diminishing returns of widespread 
floor applications of control charts and mathematical-statistical analyses 
per se, while referring to the importance of training in mathematical statistics 
for key technical personnel responsible for overall guidance and special in- 
vestigation. 

Although the author conforms admirably to the stated scope of his book, 
it would have been of great value to have him elaborate upon the engineering 
phases of process operation and quality control. Much of manufacturing is 
dependent upon developmental or at least highly technical materials, special- 
ized processes, and intricately related variables and sequences. Such terms 
as plastics, ceramics, optics, electronics, powder metallurgy, etc., provide 
the implications. Design and operation here involve specialized and technical 
supervision. Variations and scrap are often due to instabilities propagated 
through complex sequences of processes and factors. It is then necessary to 
require higher orders of training in statistical theory and methods, not so 
much for the development of elaborate technique but for engineering visuali- 
zation of the dynamics of propagated variability. Many operating improve- 
ments can be resolved only in this manner. 

Likewise, in engineering design and specification such approach is neces- 
sary to evaluate inter-related tolerance ranges, proper reference coordinates, 
economically balanced specification limits, etc. Similarly, the projection of 
inspection methods and sampling requires technical and statistical reasoning 
beyond mathematical computation. 
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The developments of practical quality control during the last few years 
have been significant and much of it has not been published. As Mr. Juran 
points out, statistical considerations and techniques have been well de- 
veloped. His book provides much needed examination of the overall problem 
of management. There is still required presentations of the engineering 
methods of process control. 

In general, we have a great problem in education and training throughout 
industry for technical, managerial, and supervisory personnel. Mr. Juran’s 
book should be in every industrial library, and should be collateral reading in 
engineering schools. 


Elementary Statistics. Hyman Levy (Professor of Mathematics, Imperial College 
of Science, London) and E. E. Preidel (Assistant Lecturer in Mathematics). 
New York: Ronald Press Co., 1945. Pp. vii, 184. $2.25. (London: Thomas 
Nelson & Sons, Ltd., 1944. 5s.) 


REVIEW BY Henry B. MANN 
Brown University 

H1s book, although it was written as an elementary textbook in statistics, 
j jenn also be regarded as an attempt to explain to the layman the basic 
ideas of statistics. The authors quite intentionally dispense with mathe- 
matical rigor and thus gain in simplicity. The book has the advantage of 
discussing only a few but very important statistics, namely: mean value, 
variance, correlation coefficient, and range. Thus the student is not con- 
fused by a lengthy discussion of various statistics like median, mode, har- 
monic and geometric mean, mean deviation and the like as has been custom- 
ary in elementary textbooks. The simplest significance tests are explained 
and the binomial, normal, and Poisson distributions are discussed. 

The book contains a short chapter on quality control which engineers may 
find useful. It may be remarked that in constructing control charts the au- 
thors assume that the mean and standard deviation of the measurements 
are known from previous observations. This enables them to make an exact 
statement about the proportion of points falling outside the control limits. 
The procedure differs however from the actual practice in which it is mostly 
necessary to determine the control limits from the sample itself. 

The book does not discuss the problem of statistical inference; it treats 
significance tests as incidentals rather than as the central problem of sta- 
tistics. The examples given at the end of each chapter are not of any in- 
trinsic interest. It would have been easy to find more interesting illustra- 
tions from biology, agriculture, economics, psychology, engineering and other 
fields. 

The book can be recommended as a textbook for a short first course in sta- 
tistics. It can also be recommended to any reader who wishes to acquaint 
himself with the intuitive basis of statistics without studying its mathe- 
matical foundation. 
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An Introduction to Statistical Analysis, Revised Edition. C. H. Richardson (Pro- 
fessor of Mathematics, Bucknell University). New York: Harcourt, Brace & 
Co., 1944. Pp. xiv, 498. $4.00. 
Review By J. H. Curtiss 
Lieutenant, USNR, Bureau of Ships, Washington, D. C.* 

HE earlier editions of the book, which appeared in 1934 and 1935, enjoyed 
j pertern enero popularity. The book was perhaps the best text of its kind 
then available. The picture in statistics has greatly changed since 1935 and 
it is the purpose of the review to discuss and evaluate the Revised Edition 
entirely on its own merits in the setting of 1945 without reference to the 
purpose served by the earlier editions. 

It is the stated aim of this book to “present the fundamental notions of 
statistical analysis in such a manner that they can be comprehended by 
students who have had but little training in mathematics, and yet in such a 
way that they can be studied to advantage even by those who have had 
considerable mathematics.” From this and other evidence, the reviewer 
gathers that this book is probably intended as a text in a college course in 
elementary mathematical statistics to be taught by the Department of 
Mathematics. The contents cover what were formerly the traditional sub- 
jects for such a course, with various brief excursions into the field of ele- 
mentary college algebra. Calculus is avoided, except in certain exercises and 
in the discussion of the Gaussian distribution as an approximation to the 
binomial distribution. 

Assuming that the reviewer has correctly inferred the intended purpose of 
this book, it seems only fair to use a sort of double standard in evaluating 
the book, and to consider it separately as a texbook in elementary mathe- 
matics and as a textbook in statistical analysis. 

As a textbook in elementary mathematics, to be used for purposes of drill 
in formal operations, and as a technical basis for further mathematical work 
(but not necessarily in mathematical statistics), this book is well written 
and well organized. The algebraic exposition is skillfully arranged. Tabular 
forms and other similar devices are frequently used in clarifying derivations. 
The numerous exercises should give students plenty of practice in substitut- 
ing in formulas and a good workout in elementary algebra and arithmetic. 
The book is obviously the work of a skillful and competent teacher of college 
mathematics. 

» But from the statistical point of view, the trouble with the book is that it 
is just about fifteen years out of date (the reviewer is not attempting to sub- 
tract 1935 from 1945); and in a field in which the methodology is advancing 
as rapidly as it is in statistics, this is a pretty serious criticism. Lip service 
is paid by the author to the newer concepts of statistical inference in two or 
three introductory passages, but when he gets down to business in the body 
of the text, he goes right back to the old Handbook of Mathematical Statis- 


* The opinions expressed in this article are those of the author and are not to be construed as re- 
fiecting the views of the Navy Department or the Naval Service at large. 
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tics and Rietz’s Mathematical Statistics. Perhaps the references sound the 
keynote to this anachronism; Pearson’s Tables for Statisticians and Bio- 
metricians is said to be indispensable for the advanced student, but Fisher 
and Yates is nowhere mentioned; and the one reference for “Chi Square” 
which is given in a footnote on page 416, is to the Handbook and to Pearson’s 
original paper published in 1900. And while everyone agrees that an author 
should be allowed pretty free choice of subject material without adverse 
criticism, nevertheless, a textbook in elementary statistics nowadays can 
hardly afford to omit all the following topics (as this one does): Analysis of 
variance, quality control, contingency tables, methods of solving the normal 
equations adapted to machine computation, Poisson distribution, and estima- 
tion by confidence intervals. Instead of these, we find a considerable fuss 
made over skewness, kurtosis (the usual error being made concerning the 
latter measure’) the coefficient of variation, and how to interpolate for posi- 
tional means. 

It is in the study of the relation between probability and statistics that the 
greatest advances have been made, and when the book skirts on such prob- 
lems, the effect is quite disconcerting. 

The one chapter on probability appears late in the book, after probability 
and random sampling have repeatedly been mentioned previously. The chap- 
ter is a modest one and the treatment is mainly intuitive, but even with these 
limitations it seems to be going a little far in the direction of brevity to 
“prove” the composition and addition theorems by apparently assuming that 
the expected value of a random variable is exactly the value it will assume! 
Yet with this introduction, plus the students’ imagination, it should cer- 
tainly have been possible to discuss adequately a few of the simpler concepts 
of statistical inference, such as confidence intervals. The author accurately 
explains the meaning of a probable error as regards the relation between 
sample statistic and population parameter (except in the case of regression 
equation, where the concept of a population line of regression never makes its 
appearance and incorrect probability statements are made about the sample 
line), but when the idea of the confidence interval is completely set up and 
ready for the kill, he stays his hand. To be sure the words “confidence 
limits” are used in one place, but incorrectly, as signifying the probability 
levels of a distribution. 

Another consequence of the weakness of this book in probability theory is 
that no distinction whatever seems to be made between the conceptual bases 
of correlation and regression. It is therefore of course not necessary for the 
author to caution the student as to the basic logical difference between the 
variables X and Y in a regression problem. However, the author does issue 
various unexplained warnings from time to time, such as the following sinis- 
ter satetment made in connection with the standard and probable errors of 
the correlation coefficient: “Since the assumptions underlying these formulas 


1 See Irving Kaplansky, “A Common Error Concerning Kurtosis” J. Am. Stat. Assn. 40:259 Je. '45. 
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are rather severe, they are to be used with care.” (Just what the severe as- 
sumptions are, or how to use them with care, is left to the student’s good 
judgment.) 

t is the opinion of this reviewer that the job of writing a good textbook in 
mathematical statistics at the elementary level, which will achieve the stated 
aims of this one, is not an impossible one; but that the author of such a text 
must realize that the derivations which fall within the scope of his work are 
not really of great value from the point of view of giving the student insight 
into the theory of statistics, and are useful mainly as drill in elementary 
formal mathematics. To be broadly useful, the work must therefore com- 
bine some mathematical elegance with plenty of good, accurate verbal 
exposition of the type which is beginning to appear now in our best applied 
statistics texts. This will be a hard job. It will be done by someone who is 
thoroughly at home in the work of R. A. Fisher, J. Neyman, E. S. Pearson, 
S.S. Wilks, W. A. Shewhart, H. Hotelling, and the other leaders of modern 
mathematical statistics, and who has had practical experience in two widely 
separated fields: applied statistics and instruction in mathematics at the 
college level. The appearance of so many new statistics texts during the war 
gives rise to the hope that such a book may soon be published now that the 
war is over and the teaching of elementary mathematical statistics in our 
colleges will undoubtedly be resumed on a greatly augmented scale. 


Statistics: Collecting, Organizing, and Interpreting Data. Raleigh Schorling 
(Head of Department of Mathematics, University High School and Professor of 
Education, University of Michigan), John R. Clark (Professor of Education, 
Teachers College, Columbia University), and Frances G. Lankford, Jr. (Director 
of Research, Public Schools, Richmond, Va.). Yonkers, N. Y.: World Book Co., 
1943. Pp. iv, 76. $0.44. Paper. 
REVIEW BY HELEN M. WALKER 
Professor of Education, Teachers College, Columbia University 

HE publication of the first separate monograph on statistics designed 

for use as a high school text is a notable event which can be better ap- 
praised after an examination of the signs appearing sporadically over the 
last quarter century, of a slowly growing demand for such material. These 
signs are so few that most of them can be mentioned in the space of a short 
review, yet they are highly significant. 

The mathematical graph was somewhat timidly and tentatively intro- 
duced into high school mathematics texts about the turn of the century but 
its use was not widespread until after 1923 when the report on the reorgani- 
zation of mathematics referred to below was published. The statistical graph 
was still more tardy in making its appearance in high school texts. While 
discussions of the statistical graph for the use of adults were readily avail- 
able no one seems to have ventured to expose high school pupils to it until 
about twenty-five years ago. What was the first high school text to make 
some use of the statistical graph I am not sure, but the first extensive presen- 
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tation of such material occurred in two books which were published about 
1920, one by Schorling and Reeve, and the other by Rugg and Clark. At the 
time these books appeared, many teachers felt this new material was a very 
radical and somewhat dubious innovation, which had little place in a mathe- 
matics text. However they exercised great influence on subsequent texts and 
by about 1930 graphic methods of presenting statistical data were taught 
everywhere in the junior high school. 

Two articles published shortly after the appearance of these texts (L. E. 
Mensenkamp, “What Graphical and Statistical Material Should Be In- 
cluded in the Ninth-Grade Mathematics Course?,” School Science and Mathe- 
matics, October 1919; Truman Lee Kelley, “Elementary Statistics in High 
School Mathematics as a Socializing Agency,” School and Society, February 
21, 1920) are notable for their brevity. Apparently there was not yet much 
to be said on the subject. 

A new impetus toward teaching some statistics to high school students 
was provided by the report of a committee which worked under the auspices 
of the Mathematical Association of America from 1916 to 1923, “The Re- 
organization of Mathematics in Secondary Education.” It made a strong 
recommendation for placing a unit of statistics in the high school course and 
reported experimental courses having such a unit at the High School of 
Commerce in New York City, at the University of Minnesota High School, 
and at the Horace Mann School and the Lincoln School of Teachers College, 
Columbia University. However, the content of these units was not described 
in any detail and was apparently still very meager. 

In March 1924 Godfrey Thomson (“Should We Teach Statistics in the 
Senior High School?”, Mathematics Teacher) not only urged the more ex- 
tensive teaching of statistics but listed the goals he thought such teaching 
should serve and the procedures he would employ to achieve those goals. 
This was by far the most specific and detailed proposal yet published and is 
still important reading for anyone interested in planning such a unit. 

While teaching experimental courses in the mathematics department of 
the University of Kansas High School in 1922-25, I became convinced that 
pupils could not only understand but enjoy such study if only appropriate 
teaching material could be produced and if teachers could be trained to 
present it. The material developed at that time I later embodied in a 70- 
page chapter on statistics in a ninth-grade text Algebra: A Way of Thinking 
(Harcourt, Brace & Co., 1936). While this material was still in typewritten 
form, George Paley tried it out in a class at the Lew Wallace Junior High 
School in New York City and reported his experiences in High Points, 
September 1936. 

In the Sixth Yearbook (1931) of the National Council of Teachers of 
Mathematics, in a paper on “Mathematics and Statistics,” I gave what I 
considered to be the general principles for determining the content of any 
course in statistics on the high school level. 

(i, Schlauch at the High School of Commerce in New York City has for years 
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taught statistics successfully. John Swenson, in his text on analysis (Jn- 
tegrated Mathematics with Special A pplication to Analysis, Edwards Brothers, 
Inc., 1935) showed what he was able to do at Wadleigh High School, New 
York City, with 12th grade pupils who were well trained in mathematics. 
Eugenie C. Hausle of the James Monroe High School, New York City, in 
“A High School course in Statistical Methods,” The Mathematics Teacher, 
January 1937, described a course she gave for several years in the 11th and 
12th grades. R. Drake in “Statistics for 9th-grade Pupils,” The Mathematics 
Teacher, January 1941, described a project which he carried on for six years 
at the University of Minnesota. 

In 1940 the Progressive Education Association, through the publication 
of a book by one of its committees (Mathematics in General Education, 
D. Appleton-Century Co.), said “ ... there are many reasons for believing 
that the scope of statistics as treated in the secondary schools may well be 
extended,” and discussed the need of the ordinary citizen for clear thinking 
about group phenomena, said he must be thoroughly acquainted with what 
is meant by such terms as central tendency, dispersion, associated variables, 
trend, and correlation. Unfortunately some of the illustrations presented are 
subject to criticism by anyone who understands statistical theory. Never- 
theless the book undoubtedly influenced many secondary school teachers to 
ask insistently for an appropriate text. 

The National Council of Teachers of Mathematics have in recent years 
frequently included a round-table discussion or a paper on the teaching of 
statistics in the high school at their annual meetings. The number of teachers 
interested in doing work in this still uncharted field has never been large, but 
those who have been present at such discussions have impressed me as well- 
trained, eager, and creative in spirit. Their reiterated demand has been, 
“When is some one who knows statistical method, knows the high school 
pupil, and is skilled in the art of writing text books going to prepare us a 
good text?” 

In 1942, at the Second Institute for Teachers of Elementary and Sec- 
ondary Mathematics held at Duke University over a period of 10 days, 
Douglas Scates gave a series of lectures on this theme, one of which was later 
printed under the title “Statistics—the Mathematics for Social Problems,” 
in The Mathematics Teacher, February 1943. This paper should be read and 
pondered by anyone who proposes either to teach statistics or to develop 
teaching materials for high school students. He considers the goals of such 
teaching to be: (a) to produce statistical literacy, (b) to accustom young 
people to doing their thinking about personal and social problems in terms of 
quantitative facts wherever appropriate, and (c) to familiarize young people 
with the processes of gathering data, and the elementary modes of inter- 
preting them. Only for students of more than average ability who have an 
intrinsic interest in the manipulation of figures, would he suggest as a fourth 
objective some technical skill in calculating and drawing diagrams. These 
objectives appear to be the ones dominating the monograph which is the 
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subject of this review, approached by so long a developmental approach, 
The Institute is again this summer emphasizing statistics as a subject for 
high school instruction, with Gertrude Cox in charge of that part of its pro- 
gram. 

There is no doubt in my mind that ultimately some work in statistics 
will be taught in a large proportion of our high schools. However for two 
reasons I fee] apprehensive concerning that development although I believe 
it to be inevitable. One reason is the almost complete lack of teachers who are 
competent both in the subject matter to be taught and in the art of teaching. 
An unskillful expositor who knows his subject may survive in a college but 
not often in the secondary school. A skillful but unscholarly teacher may do 
vast harm by imparting erroneous ideas. And make no mistake, teaching 
statistics in high school will be more rather than less difficult than teaching 
older students. The second reason for apprehension is the fear that textbook 
writers—who have committed enough blunders in college texts—would write 
for the high school watered-down versions of poor college texts. This is 
exactly what has happened in a number of recent high school mathematics 
texts which have incorporated a chapter on statistics. The material is too 
often neither theoretically correct nor psychologically sound. After reading 
the monograph by Schorling, Clark and Langford, I feel greatly reassured. 

The emphasis of the brochure is on interpretation, on obtaining and using 
statistical data to aid in making practical decisions, on ascertaining that 
data are comparable and relevant before comparisons are drawn. Relatively 
little emphasis is placed on formulas or computations. The data employed 
are richly varied and within the range of active interests of teen-age girls 
and boys. Pictures have been used effectively and they are pertinent, not 
merely something to divert attention from an otherwise dull page. 

The treatment of the normal curve as “a fundamental law of nature” is 
one of the most unfortunate features of the book. In my judgment the nor- 
mal curve is an excellent topic to avoid in the ninth or tenth grade, no mat- 
ter how large it may have bulked in college texts a teacher has studied. Cer- 
tainly it is undesirable to create an impression that ‘‘you can be sure that if 
scattering data of this general kind do not come close to a normal distribu- 
tion, then the sample from which the data were obtained was not truly 
representative of the entire group” (p. 42). 

The book loses a great opportunity to help students acquire the general 
ideas of sampling variability and unreliability of measurement. The omis- 
sion of any real attempt to develop these two concepts which are of such 
great practical importance is another weakness which should be corrected 
when the book is revised. Considerable caution is needed here to make sure 
that formal talk about standard errors is omitted and a genuine feeling for 
sampling variability is built up. It can be done, but not easily. 

Great resourcefulness has been shown in exploring statistical situations 
which are real to young people, in making them aware of the statistical 
nature of the world in which they live. The net effect of the text should 
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be to create respect for statistical investigation, to develop intelligent “con- 
sumers” of statistics. Statisticians should be grateful that the first separate 
treatise intended for high school use is of such high quality. 


A First Guide to Quality Control for Engineers. E. H. Sealy (Ministry of Supply 
Advisory Service on Statistical Method and Quality Control, London). London: 
Ministry of Supply, 1943. Pp. 38. Gratis. Paper. 
REVIEW BY W. Epwarps DEMING 
Adviser in Sampling, Bureau of the Budget, Washington, D. C. 

us little book was produced mainly by Dr. E. H. Sealy while he was 
Tith the Ministry of Supply, and who “has drawn freely on the advice 
of his colleagues” (from the preface by J. R. Womersley). The foreword 
from H. J. Gough, Director General of Scientific Research and Development, 
states that “This small volume on Quality Control for Engineers is one of 
the outward signs of the efforts of the Ministry of Supply to assist the in- 
dustry of this country to reach yet new pinnacles of production and to make 
the fullest possible use of the limited available manpower.” The assistance to 
British production rendered by the Advisory Service on Statistical Method 
and Quality Control, under the direction of Gough, Womersley, and Sealy, 
is something that statisticians everywhere can well be proud of. The booklet 
was written, “primarily for use by engineers ...as a downright working 
guide for the man who, having decided that quality control is worth trying, 
wants to experiment with it on his own shop floor. It is not the object of this 
book to sell quality control to anybody, or to demonstrate the advantages... 
Neither . . . to detail any of the mathematical theory which lies behind the 
subject ...” The book accomplishes these aims with distinction and with 
the unmistakable masterful touch of an author who understands the under- 
lying mathematical theory and has contributed to it, but who does not 
flaunt it; one who also knows his way around in engineering and shop-prac- 
tice, and appreciates what is needed in the way of clarity. The book explains 
the author’s “modified control limits,” which successfully meet most of the 
difficulties encountered in the use of control charts in shops where tool- 
wear gives rise to a continual drift in the dimensions being cut. The modified 
control limits are action lines drawn on the chart for averages; their function 
is to show whether the product is meeting the specifications and to forewarn 
the operator of the impending need for regrinding or resetting before rejec- 
tions occur. The position of the modified control limits is determined by 
measuring inward from the drawing tolerances a distance Kw, where K is a 
constant dependent on sample size and easily found in a table on page 36, 
and w is the average range in the last 25 samples. (In emergency, fewer than 
25 samples can be used if 25 have not been taken since the last adjustment 
or change was made in the process.) The average range calculated for a series 
of samples, when multiplied by the factor K gives an interval of such width 
that the tails of any ordinary distribution of individual articles (or rather 
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the distribution of the measured values on the individual articles) will just 
fall inside of the drawing tolerances, and meet the specifications. But if 
the mean of a sample falls beyond the modified control limits, then the tail 
of the distribution for individual values will be in danger of over-reaching 
the drawing tolerances, giving rise to defective material and rejections. 

The modified control limits are perhaps slightly mis-named: one might 
guess at first sight that they were mere modifications of the normal control 
limits. Actually the normal control limits may be placed in the usual manner 
which need not be described here. It is true, however, that after the major 
causes of erratic variability have been discovered and removed with the aid 
of the normal control limits, Sealy does dispense with them and relies on the 
modified limits together with the range-chart unless statistical control again 
becomes a problem. Having dispensed with the normal limits, the average 
level of past samples is not computed at all. There is, of course, loss of in- 
formation, but if the setting of the machine is likely to fluctuate anyhow, 
but not enough to cause rejections, this loss is not serious. Attention is then 
focussed on drifts of the points on the mean-chart toward one of the modified 
limits, or on a point that fell outside them or outside the control limits on 
the range-chart. In other words, effort is concentrated on meeting the 
specifications with relaxation of problems dealing with the state of control, 
once it has been attained. A good example is furnished by a machine that 
is far too accurate for the job (p. 19): the modified limits warn the operator 
when resetting is advisable to avoid rejections, but they do not force him to 
maintain the narrower and (in this case) unnecessary limits of statistical 
control. 

The booklet is filled with numerous illustrative examples, concisely and 
clearly discussed. Included is an example in the use of the group control 
chart for controlling multi-spindle autos (p. 25). Tipped in at the back are a 
number of life-size charts to accompany the illustrations in the text. The 
book contains the requisite tables for placing control limits. It is without 
doubt a distinct contribution to statistics in industry. The Ministry of In- 
dustry, Dr. Sealy, and colleagues, deserve the thanks of statisticians and 
industry for making it available. 


Operating Results of Department and Specialty Stores in 1943. Stanley F. Teele. 
Bulletin No. 119. Boston: Bureau of Business Research, Harvard Business 


School, 1944. Pp. vi, 35. $2.50. 


Expenses and Profits of Limited Price Variety Stores in 1943, Chains and In- 
dependents. Edward C. Bursk. Bulletin No. 120. Boston: Bureau of Business Re- 
search, Harvard Business School, 1944. Pp. vi, 38. $1.00. 


An Analysis of Operating Data for Small Department Stores, 1938-1942. Eliza- 
beth A. Burnham. Bulletin No. 121. Boston: Bureau of Business Research, 
Harvard Business School, 1944. Pp. vi, 42. $1.50. 
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Review By E. R. Hawkins 


Chief, Distribution Cost Unit, Bureau of Foreign and Domestic Commerce 


Washington, D. C. 


HE first two of these bulletins are the current issues in familiar series 
which have continued for many years—24 years in the case of the de- 


partment store series and 13 years in the case of the variety chain series. 
The principal findings may be summarized as follows: 


1, 


or 


Sales of reporting department stores increased 16.3 per cent in 1943 
over 1942; specialty store sales increased from 20 per cent to 35 per 
cent, depending on size, and variety store sales increased 6 per cent. 


. Largely as a result of the increase in sales volume, expense ratios were 


lower. Department store expense ratios fell from 32.05 per cent of 
sales in 1942 to 29.4 per cent in 1943; specialty store ratios dropped 
from 33.75 to 31.15 per cent, and variety chain ratios declined slightly 
from 28.82 to 28.60 per cent. Manpower shortages, restrictions on 
plant expansion, and service reductions contributed to the decrease in 
expense ratios. 


. Gross margin percentages remained about the same. The typical de- 


partment store gross margin percentage declined slightly from 38.7 per 
cent of sales in 1942 to 38.4 per cent in 1943; specialty stores increased 
their gross margins from 38.75 per cent of sales to 39.2 per cent; variety 
chains gross margins dropped slightly from 36.66 to 36.32 per cent. 


. As a result of constant gross margin percentages and lower expense 


ratios, net profits before taxes of department and specialty stores were 
higher in 1943 than in 1942. For department stores, net gain before 
Federal tax on income and excess profits increased from 9.75 per cent 
of sales in 1942 to 11.4 per cent in 1943; specialty store net gain ratios 
increased from 7.2 per cent to 10.15 per cent. Variety chains showed a 
slightly lower rate of earnings before taxes, with a decline from 10.23 
per cent of sales in 1942 to 10.15 per cent in 1943. Dollar earnings were 
higher, however. Dollar earnings, before taxes, of department and 
specialty stores were, of course, substantially higher in 1943 than in 
1942. At about the same percentages of markup, these stores were able 
to increase their earnings because total dollar gross margins increased 
more than total dollar expenses. 


. Out of earnings, department stores were able to pay greater Federal 


taxes than ever before and still retain greater net gains in 1943 than in 
1942. Such taxes amounted to 7.7 per cent of sales in 1943, compared 
with 5.95 per cent in 1942. Net gains after taxes were 3.6 per cent of 
sales in 1943 and 3.4 per cent in 1942. Variety chains paid about the 
same taxes on income and excess profits (as a per cent of sales) in the 
two years, amounting to 6.20 per cent. Net gains after taxes were 3.92 
per cent in 1943, compared with 4.03 per cent in 1942. Data on Federal 
income taxes were not available from a sufficient number of specialty 
stores to make general averages possible. 
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The bulletins present additional data on number of transactions, per cent 
of credit sales, use of self service methods, basement sales, average value of 
sales, departments opened and closed, and detailed breakdowns of expenses 
by natural and functional divisions. The expense breakdowns are useful as 
benchmarks against which individual firms can measure their own opera- 
tions. In order to make such comparisons more valid, the figures are broken 
down by size of store and size of city. There remain, of course, dangers for 
the individual store in assuming that its own expense ratios should be the 
same as any average, however carefully the other stores might be selected 
for comparability. Presumably the readers of the Harvard bulletins do not, 
however, need such cautions on the use of average ratios as Walter Mitchell 
so admirably outlined in his introduction to Dun and Bradstreet’s Standard 
Ratios for Retailing. 

In respect to benchmarks for management guidance, the bulletins are 
somewhat more useful than the annual “Departmental Merchandising and 
Operating Results” of the National Retail Dry Goods Association in that the 
Harvard figures are broken down in greater detail by function, and some- 
what less useful in that they are not broken down by departments. There is, 
in fact, a rather fundamental difference between expense studies of the 
Harvard type, which present average expense ratios for the business as a 
whole, and the type in which the store allocates its expenses to departments 
or commodities. The latter type of expense analysis is useful to the store 
even without comparison of its figures with trade averages, because the de- 
partment or commodity earns a revenue against which the expense can be 
charged to ascertain whether or not the department is profitable. This can- 
not be done with functional expenses, since functions do not directly earn 
revenue. More important, perhaps, is the point that functional analysis 
measures the efficiency with which various activities are being performed, 
but does not indicate whether these activities are in profitable directions. 
After a store has made its own departmental analysis, there is, of course, 
additional benefit to be derived from comparing its departmental results 
with those of other stores, as shown in the NRDGA reports. 

The Harvard bulletins, however, are useful for management guidance in a 
broader sense. They show in broad outline what has happened to the trade 
during the year, and are designed to contribute to an understanding of basic 
relationships and long-run trends in retailing. One of the conclusions that 
appears to be developing from the studies is that while a high level of sales 
in a department store is associated with a high expense ratio, a rapid increase 
in sales leads to a reduction in expense ratios. The explanation tentatively ad- 
vanced is that until department store managements become accustomed to 
higher sales volumes they do not allow dollar expenditures to rise propor- 
tionately. Further analysis would be required to establish this proposition. 
It seems more likely that in “normal” times high expenses (especially oc- 
cupancy and sales promotion) are necessary to create and maintain the sales 
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volume of the largest stores, while during the war period increased sales have 
been obtained without increased efforts. The largest department stores have 
not been able to increase their sales to the same extent as the smaller stores, 
however. 

The Harvard sample is perhaps not adequate to pursue further the analy- 
sis of size differences. As is pointed out in the Appendix, year-to-year com- 
parisons between corresponding volume groups is hazardous because of 
changes in the make-up of the several volume groups. To overcome this 
difficulty a separate analysis was made of the 1942 and 1943 results of 55 
stores that had reported in both years, but this sample is perhaps too small 
to give conclusive results. In fact, the entire department store sample, 367 
stores, is considerably smaller than the sample of 1,400 reporting sales to the 
Federal Reserve Board. The difference in sample is reflected in the fact that 
the Harvard figures show an increase of 16.3 per cent in department store 
sales in 1943 compared with 1942, while the Federal Reserve Board reported 
an increase of 12 per cent. The variety chain coverage in the Harvard 
studies is more complete, representing about 85 per cent of total sales, and 
the sales increase of 6.05 per cent in 1943 over 1942 is the same as that 
reported by the Department of Commerce. 

In the variety chain report is an interesting chart showing the relationship 
between total consumer expenditures and sales in variety chains and depart- 
ment stores. It is indicated that department store sales in 1943 kept pace 
with increased total expenditures of consumers, but variety chain sales did 
not. This is in line with the study “Retail Sales and Consumer Incomes,” 
by Louis Paradiso, which appeared in the October 1944 issue of the Survey 
of Current Business. In the latter study, sales of various kinds of retail stores 
ure related to disposable consumer income, rather than consumer expendi- 
tures, and it is shown that over a long period of years the relationship has 
been remarkably stable. During the war period, however, sales of durable 
goods stores have not kept pace with rising consumer incomes, because of 
shortages of goods. Variety stores have perhaps been harder hit by shortages 
than have department stores, which have increased their sales of ready-to- 
wear and accessories to offset decreases in home furnishings and other dur- 
able goods lines. 

The third bulletin, “An Analysis of Operating Data for Small Department 
Stores,” is not an annual study, but a special report covering the years 
1938 to 1942. It calls attention to the trend of retail business towards the 
smaller cities, and presents figures indicating that the smaller department 
stores may be extremely profitable. Earnings for 1942 amounted, in the 
stores studied, to a 20 per cent return on investment. The report concludes 
that there exists a genuine consumer liking for stores of this kind, and a real 
opportunity in many communities for small independent department stores. 
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Statistical Methods in Industry. L. H. C. Tippett (Statistician to the British 
Cotton Industry Research Association). London: Iron and Steel Industrial Re- 
search Council, British Iron and Steel Federation, 1943. Pp. 74. 2s. 6d. Paper. 


REvIEw BY P. S. OLMsTEAD 
Bell Telephone Laboratories, Inc., Murray Hill Laboratory 
Murray Hill, New Jersey 

HIs pamphlet includes an introductory lecture on “Probability Theory 
ype! Statistical Method” by Professor E. S. Pearson and a series of six 
subsequently given by Mr. Tippett at the University of Sheffield, Sheffield 
England, at the request of the Open Hearth Committee of the Iron and 
Steel Industrial Research Council, together with a summary of the general 
discussion that followed the last lecture of the series. The presentation is 
clear and nontechnical in character and examples are chosen mainly from the 
steel industry. The subject matter covered is indicated by the consecutive 
subheadings: frequency distributions, errors of random sampling, control 
charts and quality control, analysis of variation, correlation, partial and 
multiple correlation, sampling in practice, and design of experiments. 

Although the author is very careful to explain what he does, he has not 
always explained why he does it. A good example is the use of N —1 in cal- 
culating variances and standard deviations. The engineer who wishes to 
present the sample average and standard deviation, i.e., the root mean 
square of the sample values with respect to the mean, may not be satisfied 
with the statement on page 15, “either the definition I have given must be 
adopted, or a correction must be made.” 

On page 34, in discussing the analysis of variance, the author indicates 
that a control chart could be made to show the importance of primary 
components of variation relative to residuals. Nevertheless, he piefers the 
analysis of variance “for disentangling the effects of variations arising from 
a number of sources.” Whether or not he proposed this as a tool for the 
production engineer is not clear. Similarly, in dealing with correlation and 
partial and multiple correlation, no inkling is given concerning who should 
be entrusted with their use. On page 52 he does state, “in dealing with the 
data of fuel consumption . . . , I showed that the time for which the furnace 
is used during one period has an important effect on the fuel consumption 
during the next period—but I was only led to investigate this because some- 
one with technical knowledge told me that such an effect might exist.” This 
is an important observation. The obvious conclusion is that in dealing with 
complex problems the skilled statistician and the skilled scientist or engineer 
must work together. As a team they can do more than as two individuals 
working independently. The reason for this is that by working together 
they increase the sum total of past experience to be used with the numerical 
values in the new sample to reach predictions that apply primarily to future 
samples and their numerical values. 

In another place, page 55, a similar observation is made, “I hope you will 
see that sampling, in practice, is a rather complicated business. It demands 
not only a knowledge of the statistical principles underlying the methods, 
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but also a good technical knowledge of the field of application and of the 
sources of variation.” This means that there are pitfalls galore for either 
the statistician or the engineer if either attempts to go alone with inadequate 
preparation in the other’s field. In contrast, in the case of the control chart, 
the author’s statement, page 34, indicates that a production engineer may 
easily grasp the fundamental principles and apply them successfully. 

Design of experiments is discussed as a tool of research that has as its 
aims, identification of the magnitudes of component variations, and eco- 
nomical experimental arrangement. Replication and randomization are ex- 
plained by example leading up to the Latin square. The reader will note on 
page 59 that the author somewhat unfortunately uses accuracy as a synonym 
for precision in discussing the economics of the Latin square arrangement. 

The bibliography on page 60 is restricted to British sources except for 
Shewhart’s Economic Control of Quality of Manufactured Product. It is quite 
apparent, therefore, that it does not go into such things as run charts, 
tolerance limits, single and double sampling schemes, or lag correlations. 

In common with many British publications, this pamphlet includes the 
discussion that followed the lectures. In a number of places, this brought 
out information of a technical nature to justify certain statistical conclusions 
and in others raised doubts concerning the statistical processes proposed. 
There is reason to believe that many papers published in this country would 
make more interesting reading if similar discussions were included. 

In conclusion it may be stated that these lectures, prepared so carefully by 
an authority in applied statistics, should prove interesting particularly to 
the development engineer because they show how he may be helped by 
working with a competent statistician. By inference, they also make it clear 
that either the engineer or the statistician working alone is apt to draw 
erroneous conclusions that may be avoided by working together and thereby 
making use of the technical skills of both. Although these lectures show how 
some elementary statistical methods may be used in industry, some may 
feel that they are not an adequate introduction to the role of statistical meth- 
od in so far as it applies to sampling consumer wants, research and develop- 
ment, design, specification including the setting of tolerance limits, inspec- 
tion, and operational research to determine that standards are satisfactory, 
adequate, dependable, and economic. Obviously, all of these come under the 
general problem of quality control in industry. 


Elementary Statistics for Students of Education and Psychology, Third Printing. 
Edward B. van Ormer (Associate Professor of Psychology, Pennsylvania State 
College) and Clarence O. Williams (Associate Professor of Education). New York: 
Longmans, Green & Co., 1945. Pp. viii, 111. $1.75. Paper. 
REview BY Paut S. DwYER 
Associate Professor of Mathematics, University of Michigan 


: po is a manual which is designed to accompany courses in educational 
measurements, experimental psychology, clinical psychology, industrial 
psychology, etc. The need for it is based on the premises (a) that the usual 
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texts in these courses do not contain sufficient material on statistics and (b) 
that textbooks on statistics are too detailed for these courses. It is the aim 
of the manual to give the student “a meaningful and practical grounding in 
the minimum statistical knowledge expected of students majoring in educa- 
tion or psychology.” The emphasis throughout is upon “a reasonable under- 
standing of statistical methods and terms” and upon the avoidance of 
“involved mathematics.” Numerous references are inserted throughout the 
manual to direct the student to more formal and detailed presentations. 

After introductory sections the manual continues with a discussion of such 
topics as tabulation, graphic presentation, central tendency, variability, 
normal probability curve, comparisons of individuals and groups, and cor- 
relation. There is little material on sampling theory and what little there is 
appears in the form of probable error (or standard error) of the mean, the 
difference of two means, and of the correlation coefficient. Several appen- 
dices are included. 

Since the topics discussed are very elementary, the effort is made to explain 
in detail just what the steps are and how the calculations are performed. 
On page 15, for example, the process of calculating the mean of a grouped 
distribution with the use of class interval units measured from an arbitrary 
origin is described in eight steps using approximately half a column of space. 

It is to be expected that a mathematical statistician would be somewhat 
critical of the results of an attempt to explain methods which are inherently 
mathematical without the use of mathematical concepts which are essential 
to the clarity. The present reviewer reacts as expected on this point. As an 
example he feels that the concept of deviation from the mean should be de- 
fined carefully (either by formula or otherwise) before defining the average 
deviation (p. 20), “The average deviation is the average or mean of all the 
deviations of all the separate scores from the central tendency of the dis- 
tribution.” It is implied here (and seems not to be stated explicitly) that the 
deviation is always positive. One has the feeling that the “complicated 
mathematics” has been simplified by ignoring the essential concept of di- 
rected deviation. This feeling is amplified when one discovers (p. 70) that 
the term “deviation” suddenly implies that a sign (plus or minus) is a part 
of it. 

This reviewer is not entirely satisfied with some of the basic explanation. 
The explanation of class limits used in grouping leaves something to be de- 
sired. He also feels that the authors do not always separate the basic hy- 
potheses of a problem from the assumptions (linear interpolation for in- 
stance) used in getting an approximate answer to the problem. 

On the whole the authors are to be commended for their efforts to em- 
phasize the understanding of basic statistical terms and methods. Though 
it deals only with the more basic concepts and makes no contact at all with 
modern sampling theory and methods, this book is undoubtedly a useful 
manual for those for whom it is designed. 
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Quality Through Statistics. A. S. Wharton (Technical Adviser Statistics, Philips 
Lamps Ltd.). London: Philips Lamps Ltd., 1945. Pp. iv, 60. 6s. Paper. (Arling- 
ton, Va.: Gryphon Press, 1945. $1.50.) 
Review By H. F. Dopae 
Bell Telephone Laboratories, Inc., New York, N. Y. 

nts book sets forth in readable fashion statistical methods that have been 

found most useful in improving manufacturing quality in the factories 
of Philips Lamps Ltd. 

Writing for those engaged in production work, the author emphasizes the 
point that only the simplest quality control schemes, devoid of “the jargon 
and symbols of higher mathematics” and readily comprehended in the fac- 
tory, will prove satisfactory in practice. He accordingly sets about to de- 
scribe a number of simple procedures that have proven their worth as well as 
others that appear to have promise. The determination to keep the presenta- 
tion at an application level will be welcomed by many, though in places this 
may have been responsible for some inadequacies of technical explanation. 

The problem of controlling quality is broken down under three general 
subjects: (a) Batch-by-batch receiving inspection of raw and processed ma- 
terials used by a manufacturing unit, (b) Control inspection in process, and 
(c) Analysis of cumulative results of inspections to establish standards of 
performance and efficiency. 

On the subject of batch-by-batch inspection, a special table of sampling 
plans has been devised to provide what is spoken of as a “control” at any 
one of several values of per cent defective, ranging from 4% to 10%. A 
feature is the required selection of five equal random samples from each 
batch, with provision for taking a second set of five such samples under 
specified conditions. Just what the “}% control” or “5% control” means for 
a particular plan is not clear from the context, although the impression is 
gathered that this represents a permissible upper bound to the per cent 
defective in an individual batch. However, a few calculations indicate that 
for the specified procedure and plans given in the table, the chances of ac- 
cepting an individual batch having a per cent defective equal to that shown 
in the table heading generally lie between 50 in 100 and 75 in 100—roughly 
2 in 3 for many of the plans. Taking a specific case—2% control, an initial 
set of five samples of size 10, and a critical number (P) of 1—the chances of 
acceptance are about 5 in 10 for a 2% defective batch; and about 1 in 10 
for a 5% defective batch, that is, a batch 2} times as bad as the 2% table 
figure. The latter feature, chances of .1 in 10 of accepting lots considerably 
poorer than the table figure, is inherent, of course, in many sampling plans, 
though the reader might infer from the author’s analysis of other published 
tables that this was an objectionable feature to be overcome. The inclusion 
of a few numbers or curves showing probability of acceptance vs. per cent 
defective in submitted batches, would certainly have clarified the significance 
of the sampling tables. However, the tables undoubtedly do provide a sys- 
tematic basis for regulating the quality of incoming material, and as the 








412 AMERICAN STATISTICAL ASSOCIATION 


author brings out, this is greatly to be desired in contrast with prevailing 
practices of 100% inspection and “spot” inspection. Further, his program 
for detailed analysis of the percentage of rejects, subdivided into types of 
faults, can be warmly endorsed, judging from experience gained along like 
lines in this country. Some question exists in the mind of the reviewer 
with respect to a formula on page 11 for determining probable quality of a 
batch, which “only functions in connection with the Philips Table.” Ex- 
pression of this formula in words rather than in symbols would appear to 
hinder rather than aid understanding. 

On the subject of controlling quality during production, several simple 
control chart systems are presented with well-designed forms that should re- 
quire a minimum of paper work. All follow the usual pattern, covered by 
publications in this country and abroad, of inspecting hourly or half-hourly 
samples of 5, 10, or 20 articles. Control limits are generally based on proba- 
bility values of 1 in 1,000 and 1 in 40 in contrast with the 3-sigma and 2- 
sigma limits used here. An apparently effective fraction defective control 
chart system is described in some detail; its control limits are obtained pain- 
lessly without mathematics and records kept very simply by entering crosses 
in ruled squares instead of plotting on graph paper. For process control in- 
spections made on a measurement basis, the book confines itself to the 
simplest form of control chart, in which individual measurements are plotted, 
rather than averages and ranges, for periodic samples of 5. The book strongly 
recommends the use of this simple system, one which has been used with 
success in certain classes of work here. The manner of setting action and 
warning lines on the control chart, whereby the engineering tolerance limits 
are presumed to represent 1 in 1,000 chance limits for individuals may be 
questioned technically as a standard practice, in view of the widely vary- 
ing relationship between precision of process and closeness of tolerance 
limits encountered in production. This over-simplification may thus be 
found too restrictive. In spite of these matters, this section of the book is full 
of ideas which show how the practical man is thinking on this general sub- 
ject. 

The third general subject treated is one that needs to be emphasized time 
and again, namely, that we should make more use of collective inspection 
data (a) for improving our knowledge of the performance of individual 
machines, of individual operators, and of whole departments, and (b) for 
establishing over-all standards of performance. Too often inspection results 
are used merely for the immediate purpose of disposing of the day’s product. 
The book draws from experience along these lines and presents a number of 
specific schemes, together with charts and record forms for summarizing the 
results of analyses. A quality bonus plan utilizing the control chart is out- 
lined. Means for setting up targets for daily scrap and budgetary control are 
discussed. Brief mention is made of systems of quality grading. As the book 
indicates, a majority of these schemes have had actual trial, and have been 
found successful in varying degrees. 
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The book as a whole has been successful in explaining the usefulness of 
statistical methods in improving quality and does this almost entirely with- 
out mathematics. While many of the schemes outlined have appeared in one 
form or another in various publications, the book represents a contribution, 
for among other things it has sorted out and presented under one cover a 
set of specific control techniques that have been found highly useful. All in 
quality control work will be interested in the simplified charts and forms 
presented. Where greater refinements are wanted, reference may be made to 
the more technical literature cited. 


A Guide to Utilization of the Binomial and Poisson Distributions in Industrial 
Quality Control. Holbrook Working (Economist, Food Research Institute, Stan- 
ford University). Stanford, Calif.: Stanford University Press, 1943. Pp. 15. 
$0.25. Paper. (London: Oxford University Press. 2s.) 
Review By H. G. Romie 
Bell Telephone Laboratories, Inc., New York, N. Y. 
HIs pamphlet, although small, contains a wealth of valuable information. 
It consists of four parts covering: (a) binomial distribution, (b) devices 
for facilitating use of the binomal distribution, (c) the Poisson distribution, 
an aid to its utilization, and (d) approximations in the application of statis- 
tical theory. 

The first part indicates how individual and cumulative terms of the bi- 
nomial distribution may be computed and the field of application of this 
distribution. Since these probability values are basic to much of sampling 
theory covering attributes, the material in this article, presenting for the 
most part material scattered in different sources, should be mastered by 
those entering this field. Reference is given to Simon’s Jg charts for determin- 
ing point binomial values. These charts, however, are very restricted in their 
applications since only five charts are available covering sample size values 
from n =1 to 500 for five P values. 

It would be highly desirable to have a table of binomial values similar to 
Molina’s table covering the Poisson distribution. Some of these probability 
values are at present available in a slightly different form in published 
tables. However, no mention is made in the text or bibliography that cumu- 
lative probability values for the binomial distribution may be obtained from 
Karl Pearson’s “Tables of the Incomplete Beta Function,” for n =1 to 50 
and p =.01 to .99. 

The actual probability values for control limits based on pn-++ 3c,, and 
pn —3cy, are discussed for various values of jn, n and p values to indicate 
the difference between the point binomial probability values and the cor- 
responding normal law probability value of .00135. This detailed analysis of 
probability values is invaluable to those using control limits and shows the 
impossibility of setting exact control limits based on a fixed probability 
value. 

The areas in which the Poisson distribution may be used to determine 
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approximate probability values for the binomial values are discussed in de- 
tail. The errors arising from its use for jn =1 and pn = 2 appeared in Table 
2 for pS.2 and are depicted on an excellent diagram portraying clearly the 
three elements: (a) relative frequency, (b) sets of p and n values for pn =1 
and pn = 2, and (c) number of occurrences per sample. The Poisson dis- 
tribution appears as the limiting distribution. This same excellent diagram 
form is used to show the changes in the shape of the distribution for jn =0 
to pn = 8. Poisson probability limits with P =.00135, .005, .1, .9, .995, and 
-99865 are given on this second diagram. These two diagrams, to the best of 
my knowledge, have not appeared elsewhere and should be found to be one 
of the most valuable parts of this paper, giving a clear-cut picture of the 
nature of the binomial and Poisson distributions. 

The choice of P values only for discussing control limits is unfortunate. 
Using P =.001 and P =.999, in line with other presentations would appear to 
be more useful in practice than P =.00135 and P =.99865. It would be help- 
ful to have presented the 2-sigma and 3-sigma control limit values since 
these have been found most useful in practice, requiring little time for com- 
putation. If they were used, their associated probability values would be in 
the range ordinarily assumed satisfactory for practical purposes. For exam- 
ple, the upper 3-sigma control limit curve for values of jn of 5 or less would be 
slightly less than .995 but for larger values of jn would have values about 
halfway between .995 and .99865. As is pointed out in the text, it is impos- 
sible to achieve the probability values stipulated, and they are used essen- 
tially as guides in selecting proper values of c for control purposes. The 
author should have emphasized the fact that these probability values are 
true for the Poisson distribution and are therefore only approximations to 
the binomial probability values. He does point out that in determining these 
probability values, they cannot be considered as being exact probabilities 
when the sample represents a considerable fraction of the lot. This concept 
may be misleading, depending upon the nature of the probability under dis- 
cussion. It would have been wise for the author to have extended his remarks 
as many are confused by the fact that the probability of acceptance for 
product of p quality under a given sampling criterion differs from the 
probability of acceptance for a single lot of j quality. In fact, many have 
considered them identical. 

Two charts for the Poisson distribution are presented covering: (a) proba- 
bility limits, and (6) cumulative probability values. This second chart is a 
reproduction of the same chart appearing in the Bell System Technical Jour- 
nal, January 1941, as noted. The first chart is derived from it for ease of 
application, but is restricted in the sense that it gives only a series of values 
covering six probability values, whereas the original chart covers a range of 
probability values from P =.00001 to .99999. The directions for use of this 
modified chart are clear. Similar directions might have been included cover- 
ing the second chart indicating its field of application since it may be con- 
sidered the basis for deriving the other. 
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The paper is very readable and the points brought out are of sufficient 
importance to justify its amplification in certain instances. The charts are on 
a sufficiently large scale so that it is possible to easily obtain readings from 
them. Considering the paper as a whole, it may be considered an excellent 


presentation. 
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